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Several experiments were carried out to determine . 
whether learning disabled (LD) and behaviorally disordered (BD) 
students exhibit deficiencies rn appropriate "te v st-taking strategies 
and, if 'so, whether- these strategies could be successfully trained, 
preliminary investigations indicated that mildly handicapped students 
do exhibit deficiencies in this are&, including attention to 
inappropriate distractors, failure to successfully employ prior 
knowledge 'and deductive reasoning strategies, and failure to identify 
correctly specifi^ types of questions which call for different 
strategies. Deficiencies were also observed regarding Use of separate 
answer sheets" and expressed attitudes toward tests. In year 1, 
approximately 100 LD and BD elementary (grades 2-4) wer£ randomly . 
assigned to treatment (training on test-taking skills) or control 
conditions. All Ss scored significantly higher on a test of 
t^est-ta^ Dul^ng year 2, approximately 100 LD and BD Ss 

-(-grades 4-6) were randomly assigned to treatment (training involving 
both reading and math subtest areas of the Stanford Achievement 
Test). Trained Ss scored significantly higher on tWo subtests and 
descriptiyeXy higher on a third subtest. Extensive appended material 
includes 19 items (journal articles, conference papers, and 
manuscripts unpublished or submitted for publication) on test-taking 
skills and their implications for LD and BD students. (CL) 
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, ^* • ry. ; T Abstract •' , -. * 

Several Experiments ware carried cnit over the course* of a 24-mbnth 
f ' J .'...«■ „ " 
pedod to' determine whether: ; (a) learning disabled (LD)-afld 

behavioral ly disordered (BD) students exh*ibit'deficiencies with , 

respecii'^O' appropriate tgst-taking strategi esV apd, if so, (t?) f> 

- i * ' ■ »*» , , • • . r. > 

^ whether these strategies couTd be successfully-trained. ^ 
^ Preliminary investigations indicated/that mi Idly h&ndicap'ped 
students do exhibit deficiencies in the a?*ea. pf. test- taking * 
strategies. These deficiencies include 'attention to inappropriate 
dis tractors, failure to successfully employ prior knowledge and^ 
dedite^JLye reasoning strategies and fai lure fo identify correctly 
- specific types of questions which call for different strategies. 
Ih addi Mori,, deficiencies -were observed 1 wi th respect to use of 
separate aqswer sheets and expressed attitudes toward tests. Ih 
•the finML^teir- test-training evaluation, approximately 100 4. D and 
BD elementary- 4 age studeots representing grades 2, 3, and 4 were 

\ * * "H^ • 

randomly assigned to treatment and control conditions. Treatment 
^subjects received eight training sessions onvtest-taking skills 
with particular regard to the Stanford Achievement -Test (£AT). 

> * 1 . r . 

.All students scored significantly higher on atest'of test-taking 

••v \ ' ' 

. ski 1 Is-. 1a addition ,^:third* and fourth grade, LD and BQ students 
spored significantly higher on-the Word Study Skills subtest arrd 
exhibited descriptive increases over the experimental group with 
respect to other subtests. Second grade students wer;e apparently 



, e - 



/ ■ < ^ ' 

unaffected by the training p/rocedurg. In addition, a similar* 

test-training package applied to intact third grade classrooms of 
p 

mostly .nonhandicapped sttf^ients indicated that these materials were 
.successful in improving student attitudes toward the t^st-taking 
^experience. 

During tlie.Y.e|ar 2 test training evalu&ti oil, approximately 100 ID 
and BD" fourth, fifth, and sixth grade students/were randomly 
assrgned to treatment and control conditions. Treatment condi<tidn 
subjects received' five v days of training on revised and, extended 
training materials which involved both feading and math subtest * 
ar$:> of the SAT. Results ^indicated that trained students scored 
sicMf icantly higher on twtf^ub tests, and descriptively hfgher on a 
third subtest. In a second experiment, 24 special education 
teachers (of approximately .,200 students) were assign&d at random 
to training and control conditions.* Trai ning. conditi oh teachers 



r 



were given materials for five days of training of test -taking 
skills for the Iowa Test* of Basic Skills (ITBS). Data from this 
investigation will be analyzed during Year 3 of the project. 



. . . • ■ <v 

v. 



1 • % . ° PROJECT OVERVIEW , - 

The primary objective. of this project was, to determine - 
whether scores 6n standardized ach-i evement tests .couftkbb 'improved 
through a combi nation of reinforcement, -practice; and training of 
"test-taking skills"; that is, those skills which refer to 
j understanding of the' most efficient means to t£ke a test rat'her 

than knowledge of the- content afcea (see "Research in Progress," 

/ 

Appendix A). Such training, if successful, would likely fmprQve 

the validity of resulting test scores' in that a potential source 

of error,' i.e., difficulty with format, Resting conditions, etc., 

would be eliminated.* In addition to the maj^r^objecti ves , several 

smaller investigations were planned ^and* carried out, the ultimate 

i 

objective of which was to determine whether,, i.n fact, students in 
spe<£La«l education* placement exhibited specific deficiencies on 
selected .aspects of test-takifig. V . 

Year One Activities 

A series of studies was initiated to evaluate what specific 

skills lower functioning students may lack with respect to test . 

taking, and to develop a new ^et of materials which might address' 

these needs. Accomplishments are described below by ea^h task. 

' 1. Assessment of spontaneously empl oyed test-taking 

— ■ , ■ — ■ — ^ 

strategies (July-December, 1983). A shorter version of the 
Stanford Achievement Test, Reading subtests, questionnaire form 
and follow-along sheet, was developed in order to evaluate the 



ski lis* students spontaneously employed irf t§st-takin§ situations, 

^ipse- Materials were utilized in sev^al studies to acquire this 

information. Students were selected from two remedial and ohe ' ' 

'original program-from each, of grades. 1 through 7. Students were 

V 

• individually a9mfni stered selected subtests of the Stanfo'rd 

■* 

Achievement 'Test. They were C asked for 'their Tevel of confidence / 

'■,..'/ 

for each- answer and the strategies they had .chosen for answering 
the questions. It was determined that a complete hierarchy oy 
strategies^ existed wfth respect to answering test questions yeyond 
simply knowing or not knowing the answer, and that these / ^ 

strategies resulted in differential levels of performance on the 

* < y 

part of the students. This investigation is describee! /in detail 

^in^the manuscript iri Appendix B entitled, "An Analysi/s of 

Children's Strategy Use on Reading Achievement Test^V This 

manuscript has been published in Elementary Schocyl Journal . . 

Additional evaluation of the data from this investigation * 

indicated the existence of a developmental trend through the 

elementary grades, in the use of elimination strate^jies/on 

\ /• 
ambiguous multiple choice 'items . That is, as*children got older 

' * / s 

they became more proficient with respect tottheir spontaneous' 

ability to eliminate inappropriate or obviously incorrect. 

al ternati vesf These results have- also been described in detail in 

the manuscript entitled, "Developmental Aspects of Test-Wiseness 

for Absurd Options: Elementary School Children," which is given 

in Appendix C. * 



■ * 

f J . . • . 

( A test pf "passage independence" of reading comprehension 
test items on the Stanford Achievement Tes't was developed' by 
^ administering items from the Reading Comprehension subtest of -the 

SAT to college undergraduates. The purpose 4 of this investigation* V 
was to determine what proportion of these test items were 
potentially answerable by employing prigr knowledge or deductive 
reasoning skills. It was determined that college undergraduates * \ 
were able to answer nearly 80% of these questions on the average, 
with many students answering them alt correctly. This article is . 
given in Appendix D under the title, "Passage Independence in 
R-e^ding Achievement Tests: A Follow-Up, " and has been published 
in th^journal Perceptual and Motor Skills , y 

;Two follow-up investigations were intended to examine more * / 

precisely the^nature of test-taking strategies 



employed by 



learning disabled students, specifically as compared with the 
strategies employed by their non-disabled counterparts. In one 
investigation, ID and non*LD students were administered items' from 
the ^Stanford Achievement Test, Reading Comprehension subtest, with 
the actual reading passages deleted from the test. Students were 
told to simply answer the questions the best that they .could. In 
the second experiment, all items were read to both groups of 
studenfs in order to control for general reading ability. In both 
experiments, students not classified as learning disabled scored 
significantly higher on t,his test of "passage independent" test 
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f 

interns than did their llarning disabled counterparts. These 
results indicated (a> that learnijiS. disabled students may' differ 
with respect' to spontaneous test-taking strategies, such as use of 
.prior knowledge and deductive reasoning skills, and (b) raise the 

) \ ... ' ; 4 

issue of what such tes,t ttemS" are actually measuring, since they. 

,<\ » 

could be so easi ly answered wi thout having reefd the corresponding 

passage. This investigati on^+ras been written in manuscript form 

and is in Appendix E under the title, "Are Learning Disabled 
i * 

Students Test-Wise: An Inquiry 'into Reading Comprehension Test * 
Items." '/It has been submitted for publication and was presented 
ar the annual meeting of the American Educational Research*. 



Association, Chicago, April, 1985* (see footnote, page 23); 

In a second investigation, learning disabled and non-learning 
disabled students were directly .questioned with respect to 
strategies they employed on reading comprehension test items an 
letter sounds test items. In this investigation, it was found 
that 'learning disabled students did not ditfer from their non- 

. \ 

disabled peers with respect to\answering recall comprehension 

questions, with ability to read controlled. However*, learning 

disabled students were less Jikely to employ appropriate 

strategies to answer inferential questions and reported 

inappropriately high levels of confidence in their .responses. In 

addition, when they did report using appropriate strategies, they 

r ■ 
were much 4 less likely to employ them successfully. This project. 



li 



• is "described in detail in the manuscript, "Learning Disabled 
Students' Spontaneous Use of Test r Taking Skills on fading 
Achievement Tests" (Appendix F). This manuscript has been 
accepted for publication in Learning Disability Quarterly and was 
presented at the annual meetin^of the American Educational 
Research Association in New Orleans in April, -1984. 

In a separate investigation it was determined that a sample 
*of elementary-age behaviorally disordered students scored 
significantly lower, th'an their nonhandicapped counterparts' wi th - 
respect to reported attitudes towards test? and the test-taking 
situation. This manuscript was published in the journal 

, Perceptual and Motor Skills and is given in Appendix G. These 
investigations, taken together, provided valuable information 
regarding the most optimal training package to 'Be developed for 1 
use with mildly handicapped students. 

An evaluation of oil major achievement tests was also made in 

^order to determine whether tests were similar or different with 
respect to format aemands on the test taker. In this # * ^ 
investigation, al] levels of six major achievement tests were 
evaluated for number of format changes per minute throughout the 
reading achievement test subtest. It was determined that . 
achievement tests. varied widely with respect to format demands, 
with most format changes occurring in the primary grades. These 
results are documented in the manuscript, "Format Changes in 
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Reading Achievement Tests: Implications for Learning Disabled' 
Students," which can be found in Appendix H and has been accepted 
for publication in Psychology in the Schools . 

In order to evaluate appropriately all previous?- attempts to «, 
train test-taking skills in the elementary grades, a meta-analysis 
was completed of all available studies in this area. It was 
determined that although the general effect of training was 
positive, differences in favor of training groups did' not seem to 
become substantial unless training was relatively extensive. In 
addition,' this meta-analysis' revealed that low SES children and 
primary grade children were more, likely to benefit from extended 
training hours. This seems to underline the importance in the 
present project of implementing a package .with a higher level of 
intensity. The detailed results of this meta-analysis are given 
in Appendix I under the title, "Teaching Test-Taking Skills to 
Elementary Grade Students: A Meta-Analysi s. " This manuscript has 
been accepted for publication in Elementary School Journal . 

Finally, 'during the first part of the project, the scope of. 
the proposed research was described and published by Exceptional 
Chi Idren in the fall of last year and is given in Appendix A under 
the title, "Research in Progress: Improving the TesUTaking 
Skills of Learning Disabled and Behavi orally Disordered Elementary 
Students." In addition, during the fall, preliminary findings 



13 ' 



were reported at the seventh InrfluaX conference of Severe Behavior 
disorders of Children and Youth in Tempe, Arizona, in a 

presentation entitled, "Training B eh avi orally Disordered Children 

to Take Tests." 

It was the intention o/ all of the above investigations to 

evaluate both tests and test-taking strategies of mildly \ 

handicapped students in order to determine the most likely 
. strategies for intervention and v the form that intervention should 

take. In all, it was determined that mildly handicapped students 

do differ from their nonharidi capped peers with respect to use of 

L / 

appropriate strategies on standardized achievement tests. ',.It v^s 
also determined that these strategy deficits included use of prior 
knowledge, use of deductive reasoning skills, attention to ^ 
appropriate distractors, and selection of strategies appropriate* 
to correctly answering different types of terms. 

2. Development and revision of 'training materials 
(September-February, 1983-1984). Based upon results of fhe above 
investigation and careful evaluation of the Stanford Achievement 
Test, materials were d|veloped which were intended to teach to 
second, third, and fourth grade children in specjal education 

y 

placements skills appropriated the successful taking of the 

Stanford Achievement Test. These materials' included eight 

scripted'lessons and a student workbook of exercised on subtests 

* ? 

meant to brVery ; similar to those used on the Stanford' Achievement 
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* , ■ . 10 

Test, These materials were intended to teach both general test- 
_ taking strategies, such as efficient time usage, as well as 
specific lessons meant to increase understanding of the particular 
test fc demands of the individual reading subtest of the- Stanford 
Achievement Test. These materials are included with the Year 1 * 
Final Report and corresponding ERIC Document and are entitled 
"Super Score. 11 * 

Follc5wing the preliminary development of materials, they were 
pilot-tested in- November on two groups of second grade children 
with learning and behavioral disorders/' On the basis of this** 
pilot investigation several revisions were made in the materials. 
Specifically, some of tfie lessons proved to be too long, and some 
instructions were judged to be ambiguous. In addition, a pre- and 
posttest measure which was developed for use with'this population" 
was also judged to be inadequate to effectively assess progress 
made on these materials. 

On the basis of the initial pilot investigation, the 
materials were revised and expanded to include second to fourth 
grades and were then implemented in a larger field test involving 
16 students in special, education placements in second and third 
grades. Students were randomly assigned to 'treatment and. control 
groups at each of the three grade levels, and the lessons were 
administered to the" treatment groups. 1 Students in the* 
experimental group were seen to score higher than students in the > 
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control group on a shortened version of the Stanford Achievement *, 
Test, word Study Ski lls v subtest. jhis investigation was- reported . 

\ 

in a manuscript which was published, in Perceptual ^and Motor 
Skills , Appendix J. ' . • 

Some final revisions were made of the training materials on 
the basis of the second field test, and "mater fals were finally 

prepared for spring implementation immediately prior to district- 

•* * 

wide standardized test administration. While final revisions were ' 

-being made, individual schr )ls were contacted to be involved in a 

larger experimental study -i ntende,d .to validate these materials. 

For this study, approximately 110 students enrolled in special 

-education classes in grades 2, 3, and 4 in two different large 

elementary schools were selected and randomly assigned to 

treatment an£ control conditions. v Four persons, including the 

principal investigator, took part in the 'two-week training period 

which" was administered at the end of March. This -tr^Tning was- 

administered in eight 20- to 30-minute sessions -given from Monday 

to Thursday for each of two weeks immediately prior to 'district- 

* ** ■ 

^wide test administration. At the same time, materials were 
developed intended to increase test-taking skills on the 
^Comprehensive Test of Basic Skills and were administered in the 
school districts adjacent to Utah State University. This training 
package w^s implemented in local third grade classes in order to 
determine (-a) whether thes^&xprocedures were appropriate for whole- 
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class administration, (b) whether the material^ developed for the, 
Stanford Achievement Test could be easi^^afdapteti to other tests, 
and (c) whether such training could be seen to^have an impact upon 
test scores/ attitudes, , and time on task during test 
administration, 

* .The results of tJjjeHnr^ on thfe Comprehensive Test of 
Basic Skills in th/ local -third gKade classes indicated tkjat 
students 1 atti tuftfas /had/.ih \f act.,; Qualitatively improved aw 
result of the test\rai ning. Ith/as suggested, that the test 
training had resul ted ifr^mofe normal distribution of attitudes 
after the end of the three days of testing and implred that the 
training had made the test-taking experience i.tself less' traumati c 
on the part of third grade regular classroom students (including 
15% mi ld%Qiandi capped students). Time on-task*during directions 
and during the test-taking experience itself did not seem to be '* 

; affected by the training package.' 1 In addition, the training was 

r 

seen to significantly increase the scores of students in the lower 
half of the class on the Wor^Attack^subtest of the reading te^t. 
Analysts of the top half, or the group -as S whole, was riot 
possible 4ie to*the presence of strong cei-ling effects in both 
experimental -and control groups. This investigation has been 
written in manuscript form ancf is giveti in Appendix K under the 

$itle, "The Effects of Training in Test-Taking Skills on Test 

•- 

Performance/ Attitudes, and On-Task Behavior of Elementary School 
Children.'" ' "V^/* 
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Results of the training package with second, third, and 
fourth 'grade special education students also indicated that the 
training.was successful in improving scores on standardized 
achievement tests. Although on'ly descriptive differences were 
seen \r\ some subtests, ' the trailing package significantly improved 
the perforparice of the experimental students over control students 
in the Word Study 'Skills subtest. This improvement>as judged to 
be approximately ^uivalentHo a three- to four-month increase 

equivalent grade level. The fact that improvement in the. Word 

s p . , . < . / 

Study Skills subtest, was observed was considered to be due tiT the - 

\ • / 

fact that this particular subtest involved many smaller subtests, 

several format changed, and potentiaTty c conf using directions for* 

which the training package was thought to have been particufarly 

helpful. Descriptive differences wefe seen in other subtests of 

■ y m 

the SAT but, not being statistically signjf i cant , it is not 
possible to determine whether, they were a resul,t-of the braining 
or simply ^sampling error. Evaluation of scores of the second 

tF- ■ 

grade students indicated that they apparently had not benefited 
from the training package,. However, the differentially small 
number of subjects :in tffe v second v grade sample, attrition suffered 
dprifig ,the # traini ng, and the tfact theft the two 2nd grade groups ° 
were in retrospect found to have differed with respect, to the 
previous year 's* testing, obscure cl ear iQterpretati on of" this 
data. It may be, for example, that second grade LD and BD ^ ~.\ f 
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students have insufficient reading ,*and other cfcademic skiils to 



that these students had in fact benefited but that'due to sampling 



enable them to benefit fftfm this training package, or ft dojuld be 

jfite 

and attrition problems these benef lis v were not observed; TJiis 
entire investigation has been described in detail and is given in 
Appendix L lender the, title, "Improving the Test-Taking 3|cills of 
Behaviorally Disordered and Learni Jjig Disabled Children," which has 
beeTi accepted f or *putfl i cation in ' Exceptional Children ; 

' y ■ * . .'■ ■ 7 

Year Two Activities ' 
/ * 

. Progress of the second* year 's activi ties' has proceeded in 

e , • L * 

accordance with the planned schedule of activities. These 

activities are described below: 

1. Teacher val i dation* for training materials (July through-'. 

: . v ^ % 

June 1984-1985). Materi als devel oped during^ Year 1 were further 

adapted for teacher use for the Iowa Test of Baste Skills (see 

/ . /. 
Appendix UJ'and given to a randomly assigned experimental group of. 

special education teachers (N = 24) in Mesa,- Arizona, for 

implementation during the ,two weeks immediately prior to yearly 

testing. This training took place in April, 1985. When test data 

become available, tes't scores will be comp^ed 'sta'ti sti cally with 

* , . \ ' ' . ■ 

the control group, , \ 

2. Needs assessment . Since a major format change in\ 
standardized tests, which takes place in the upper elementary 
grades, is the use of separate answer/sheets , a preliminary 
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. * — / , 1, 

• evaluati on was m*uJe of the relative .ability of learning disabled 
students to utilize separate answer sheets. Results of this 
investigation indicated that 'ID students differed with respect to * 
-sp|ed of responding but not accuracy of responding, with speed 
controlled. In additjon,, descriptive results suggested that LD 
students may be more likely to go outside the line of'the answer 

; 

cjrcle. Ttte manuscript which describes this investigation is 
entitled, "Carl LD Students Effectively Use Separate Answer 
fheets?" and is found in Appendix M. Additionally, a follow-up 

; y 

investigation to, Year 1 was conducted *on. attitudes behaviorally 
V disordered studepts report toward .achievement tests. Although the 

findings of Year 1 were somewhat contradictory (see Appendix G), 

• the Year 2 investigation provided additional inf ormati on that BD 

* ' - \ 

, * students do express more negative atti tudes' toward testing. The*. |v 

* manuscrfpt describing this investigation is entitled, "Attitudes 

# J- 1 

of Behaviorally Disordered Students Toward Tests:' A Replication, " \\ 

• «► . 

* / : and is in App'endix N. Findings from the above two Investigations 

\ 

were considered in developing Ye at 2\trai ning materials. x 

3. Materials development (Septernblx/Octdier^ 1984). Based * 

k i \ * v ■ ! / , ,V 

upon the results of a needs assessment, materials were dfevelope\j 

\ : - * \ 

ta teach specific test-takjpig skills oyt -reading and mathematics A 
achievement tests and are 'given ir^Ap^endix T. . Information gained \ 
from the development of materials from % Year h was utilized. Since \ 
the Year 1 studies did n©><fesult in improvement on reading V * 

• - . . - - ■ v 
t i ■ . \ 
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/comprehension subtests, training in this area was intensified. 

4. Pilot test (November/December, ,1984). As the materials 

» •■ ■ , 

were developed, they were pi lot-te$ted^on a small group of 
children in order to determine whether 'they do, "ftn fact, teach the 
skills which they are intended to te'ach. ' ' % ' % 

5. \- Field test (Novem^/Dec^ember, 1984). This test was not 
conducted because of the early (February 1) test administration in 
the Granite District this year and ihe fact that pi 1 ot- testing 
results were satisfactory./ ~~~~ 

6. Experimental stucfy (January/February^ 1985).. Based upon 
the resu.lts of the pilot test and the results of training from 
Year 1, an experimental study invol vingjapproxvnately 100 students 
in special education classes in grades four- through- six was 
implemented immedi ately pri or to the^regularly scheduled 
administration bf district-wide tests , ^February I? This training 
employed five 20- to 30-minute lessons wi tj^accompanyi ng 
workbodlcs, 

Test scores of experimental and control . students were ; entered 
into a 2 (experimental vs. control) by 2 (LD vs.^BD) analysis Ot 

variance on each of the five trained subtests. Results/replicated 

\ / 

thtfse of Year 1 in that a significant effect was found/for trained 

students on the Word Study Skills Subtest. Trained students 
~scored~an average of 9 percentile points Higher than untrained 
students , 'consistent with Year 1' findings, and considerably higher 

i 4 
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than many previous findings with non-handicapped students. In 
addition, a significant effect-favoring trained students was found 
on the Mathematics Concepts subtest. An .obtained interaction on 
this subtest indicated that training , had exhibited a differential 
effect on behavi orally disordered students. In addition, a 
descriptive but non-significant affect favoring trained students 
was found on the Mathematics Computation subtest. .As in Year 1, ■ 
no effect was found for. the Reading Comprehension Subtest.. This 
investigation is described in detail in 'the manuscript entitled, . 
"The Effects of Coaching on the Standardized Test Performance of 
Mildly Handicapped Students, 11 *which is given in Appendix 0. . - 
7. Other accomplishments /* A review paper critically 

evaluating "test-wisertess" and its implications for special , - 

J 

education was written and accepted for publication in the journal ■ 
SchooTPsychology Review (Appendix P). Also, a meta-analysis of 
research on test-anxiety is being conducted. To date, 80% of 
available articles have been coded. A paper describing the 
utility of standardized achievement test scores was presented at * 
the Conference on Severe Behavior Disorders, Ternpe, Arizona, 
November, 1984* (see footnote, page 23). A manuscript based, on 
this presentation; has been accepted for publication in Behavioral 

Disorders Mono g raphs and is in Appendix Q. Another paper 

' v 

describing differences between ID and BD students in achievement, 
test scores, entitled "Academic Characteri sties of Behaviorally 
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Disordered and Learning Disab^fed./S'tudents, " has been*tentative1y * 

accepted for publicatWl-J^ r^etravioral Disorders and is \yf 

Appendix R. Finally, & presentation describing the project's 

i V ; 

activities was given at xrne annual meeting of the'Association for 

• 

Children and Adults with teeming Disabilities, San Francisco, 
February, 1985* (see footnote, page 23^, and was attendjs^ 
approximately 300 professionals. The paper from this project is 
in Appendix S. 14 

Titles of project publications, presentations, manuscripts, 
and training materials generated to date are^'ven on the 9 
f ol lowing images . ■ ■ 
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Publications 

Lifson, S., Scruggs, T. E.„ & Bennion, K. (1984). Passage^ 
independence in reading achievement tests: A follow^. 
• Perceptual and Motor Skills , 58, 945-946. m (Appendix D) 
Hastropi'Sri, M. A. ,■ Jenkins, V. , s & Scruggs , T. E, (in press). 
Academic and intellectual characteristics of behaviorally 
disordered children and ^youth. Monographs in Behavior ' 
Disorders , 9. (Appendix Q) 
Scruggs, T. E. ^ (in press). Administration t and interpretation 
pf standardized achievement tests with learning disabled 
and behaviorally disordered elementary school children. 
Final Report . Lpgan, UT: Utah State University. (ERIC ~ 
v Document Reproduction Service) 

Scruggs, T. E., Bennion, K., & Lifson, S. (1985). An 

analysis of children's strategy us-e on. reading achievement 
tests. Elementary School Journal , 85, 479-484. (Appendix 
B) 

Scruggs, T. E., Bennion, K. , & Lifson, S. (in press). 

Learning disabled students' spontaneous use of test-taking 
skills on reading achievement tests. Learning Pis-ability 
Quarterly . (Appendix .F) - / 
Scruggs, T, E., & Lifson, S. A. (in press)./ Current 
conceptions of test-wjseness: Myths ancf realities. 
School Psychology Review , 14(3). (Appendix P> 
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Scruggs , T. E., & Mastropieri, M. A. (in press). Improving 
the test-taking skills of behaviorally disordered and 
learning disabled students. Exceptional Children , 
(Appendix I.) 

Scruggs, T. E. , Mastropieri, M. A., Tolfa, D., & Jenkins, V. 
(1985). Attitudes of behaviorally disordered students 
toward tests. Perceptual and Motor Skills , 60, 467-470. 
(Appendix G) ; 

Scruggs, T. E. , & Tolfa, D. (1985). Improving the test- 
taking skills of learning disabled students. Perceptual 
and Motor Skills , 60, 847-850. (Appendix J) 

Scruggs, T. E., White, K. R. , & Bennion, K. (in press). 
Teaching test-taking skills to elementary grade students: 
A meta-analysis. Elementary School Journal . (Appendix 

Scruggs, T. E., & Williams, N. J. (in press)". Teaching test- 

' taking skills to learning disabled and behaviorally 

disordered children. SUPER SCORE: Test taking manual and 

workbooks . Logan, UT: Utah State University. (ERIC 
Document Reproduction Service). 

^Taylor, C, & Soruggs^ T./e. (1983). Research in progress: 
Improving the test-taking skills of learning disabled and 

behaviorally disordered elementary students. Exceptional 
Children , 50, 277. (Appendix A) 
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13. Tolfa, D>, Scruggs, T. E., & Bennion, K. (in press). Format 
changes in reading achievement tests: 'implications for f 
learning disabled students. Psychology in the Schools . 
(Appendix H) 
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Presentati ons ^ 

1. *Scruggs, T. E. (1985, February). Improving the test-taking 

skills of learning disabled students . Paper presented at 
thS annual meeting of the Association for Children and 
Adults with Learning Disabilities, San Francisco, CA. 
(Appendix S) j 

2. Scruggs, T. E., Bennion, K., & Lifson, S. (19847^pri 1 ) . 

Spontaneously employed test-taking strategies of high and 
low comprehending elementary school children . Paper 

presented at the annual njeeting of the American ^ 

Educational Research Association, New Orleans, LA. 

( 

(Appendix B and F) 

\ 

3. *Scruggs, T. E., & Lifson, S. A. (1985, April). Are learning 

i » • ' ■ 

disabled students 't.estwise 1 ? An inquiry into~reading 

comprehension test items . Paper presented afc the annual 

meeting of the American Educational Rfesear^fh Association, 

■ r 

Chicago, IL. (Appendix E) 

4. *Scruggs, T. E., & Mastropieri, M. A. (1984, November). 

Academic characteristics of behaviorally disordered and 
learning disabled students . Paper presented at the eighth 
annual conference on Severe Behavior Disorders of Children * 
and Youth, Tempe, AZ. 
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5. Scruggs, T. E., & Taylor, C. (1983,' November) . Training 
behaviorally disordered children to take tests . Paper 
presented at the seventh annual conference on Severe ♦ 
Beh.avior Disorders of Children and Youth, Tempe,* AZ. 
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★Out-of-state travel funds were not awarded this project for the 
1984-85 funding year. However, since it is the view of the 
principal investigator that national presentations are of 
critical importance for* immediate and widespread di sserainati on of 
project findings, alternate sources of funding were located to 
meet these expnses. * 
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Manuscripts ^ 
\ 1. Scruggs, T. E. , Ben/iion, K. , & Williams, N. J! (1985) :> The 

effects of graining in- test-taking skills on test 
. ^ performance, attitudes, and 'on-task behavior of elementary 

school children . Unpublished manuscript, Utah State 
k University, Logan, UT V (Appendix K) 
2. Scruggs, T. E., & Lifson, S v . A. (1984). Are learning disabled * 
students 'test-wisd? 1 : An inquiry into reading 
comprehension test iterrk . Manuscript submitted for 
. ' r publication. (Appendix E)' \ «/. 

? 3. Scrtrggs, T. E., &. Mastropi eri , M. A. (1984). Academic 
> f \charac-tenstics of beh'aviorally disordered and learning 

• Mr 

disabled students . Manuscript submi tted- f or publication * 
(accepted pending revisions, Behavioral Disorders ). 
(Appendix, R) « 

4. Scruggs, T. E;, Mastropieri , M. A., & Tolfa, D. (1985). The 

^ effects of coaching on the standardized test performance of 

j - mildly handicapped students ., Unpublished manuscript, Utahv 
State University, Logan, UT. '(Appendix 0) 

5. Scruggs, T. E. , & Tolfa, D. (1985). Developmental aspects of 

test-wisengss- for absurd options: Elementary school "* . \ . 

i chi Idren . Unpublished manuscript, Utah State University, 



Logan, UT. (Appendix C) 
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toffi, D. , & Scruggd, T. E. (1985). Can. LP student s 
effectively use separate an^weer sheets ? Manuscript 
submitted for publication. (Appendix M) 

Tolfa, D., Scruggs, T. E. , & MastrojKeri, M. A. (1?85). 

Attitudes of behavi orally disordered students toward tests: 

;• — 
A replication . Manuscript submitted for publication. 

(Appendix N) \ . 
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Unpublished Products \, * 

Scruggs, T. E. (1985). SUPER SCORE II: Training manuals and 

workbooks for the Comprehensive Test # of Basic Skills . 

Logan, UT: Utah State University. (Appendix V) 

Scruggs, T. E. fl985). SUPER SCORE III: training manuals and 

, workbooks for the Iowa Test of Basic Skills . Logan, UT: 

Utah State University. (Appendix U) • . 

Scruggs, T. E. (1985). SUPER SCORE: Training manuals and 

> K • 

wsrkbooks for the Stanford Achievement Test. (Appendix T) 



RESEARCH IN PROGRESS 



Charles C. Ctcland 
Department Editir 



Irtipioving the test-faking Skills of 
LQ^and BD. Elementary Students 

.Prindipol Investigator*: Cie Taylor an? Thomas 
\Scruggs. Exceptional Child Center. Utah State Uni- 
versity. 

Purpose/Objectives: The purpose of this investiga- 
tion is to determine whether rein force men tech- 
niques and direct training In test-talcing skmsWi 
increase the validity of test scores for learning dis- 
abled (LD) and beWi orally disordered (BO) stu- 
dents. To determine the degree to which LD end BO 
students exhibit inappropriate (inefficient) test-tak- 
ing skills, students ire observed and interviewed 
while taking standardized test/, Based on those 
observational data, procedures and traininj pack- 
ages will be designed to increase student perform- 
ance on standardized achievement tests. IT the proce- 
dures and training are effective, educational deci- 
sions, which are frequently based in part on the 
results of standardized achievement testa, wtll'be 
more valid because problems in ereas such as test- 
taking skills, student motivation, ana* confusion due 
to testing format will bo reduced or eliminated. 



Subjects: Subjects urn 100 iduinnntary studonts en- 
rolled in 12 resource rooms and self-contained class- 
rooms for children with learning disabilities and 
behavioral disorders. 

Methods: LO and BO children matched On age. 
handicap, and standardized achievement test score 
will be randomly assigned to experimental and con- 
trol groups. Students in the experimental group will 
receive materials and procedures designed to Im- 
prove the ability of handicapped students to take 
tests. Experimental and control groups will be com- 
pared statistically on several measures, including 
attitudes toward test-taking, student and teacher 
behavior during test administration, and actual per- 



formance on standardized tests of reading achieve* 
ment. In following yrfars, materials will be devel- 
oped and implemented for mathematics achieve- 
ment tests and test-taking skills for 'secondary-age 
handicapped students. I , 

/-Jleaif/fJ ro Date: Preliminary findings indicate that 
many LO and BO children, as well as low achieving 
nonhandicapped students, do not spontaneously ex- 
hlbit efficient test-taking, behaviors.. Specifically, 
handicapped children have been seen to exhibit 
difficulties with item format and dfstractors more 
typical of naive test takers. 

Commencement one/ Estimated Completion 
Dates: This investigation began July 1. 1983 and is 
expected to continue for three years. 

Funding: Funding for this investigation has been 
provided by a grant from the U.S. Department of 
Education. Research in Education of the Handi- 
capped. y 

Publications/Product fjivaliable: Preliminary ma- 
terials for improving test-taking skills, piloted pn 
nonhandicapped second-gratis students, have been 
developed and .will be revised for use with handi- t 
capped children during the* coming year Manu- 
scripts documenting thu investigation will be com- 
pleted and submitted for publication during the 
second half of the academic year. Please write the 
authors for farmer Information. * 



"Research in Progress" is a forum for reporting 
ongoing research in the field of special education 
that has not yet been pOblfshed. Investigators 
wishing to report studies In progress are invited 
to submit a brie/ synopsis of their efforts to fhe 
column editor, Chorlcs C. Clcland, 3427 Monte 
VistoMustin TX 78731. Reports are to be submit- 
tedin triplicate and should /allow the format 
sh^tvn above, with a maximum length of 500 
words. 
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Much of what constitutes reading instruc- 
tion in today's public schools reflects Stu- 
dents' scores on standardized achievement 
tests. Test performance may influence later 
assignment to^reading groups, classrooms, 
or remedial or special education pro- 
grams. Although norm-referenced read- 
ing tests have been criticized as insensitive 
to specific skill deficits and inadequate as 
complete diagnostic measures (Howell 
1979), most reading tests have nonetheless 
been shown to be highly reliable and valid 
(Spache 1976). For better or worse, stand- 
ardized reading tests are truly a part of 
education today and will most likely be used 
in the future. 

If important decisions are to be based 
on the results of standardized reading tests, 
student scores should provide the best pos- 
sible estimate of reading performance. 
Unfortunately, the results of past research 
indicate that reading test performance can 
be influenced by factors other than knowl- 
edge* of" test content (e.g», Taylor 8c White 
1982). One of these factors, "test-wise- 
ness" (T W), was first described in detail in 
1965 by Millman, Bishop, and Ebel (p. 707) ' 
as **a subject's capacity to utilize .the char- 
acteristics arid formats of the test and/or 
the test-taking situation to receive a high 
score." Millman et at, developed an outline 
of test-wiseness principles, which included 
time-using strategies, error-avoidance 
strategies, guessing strategies, and deduc- 
tive-reasoning strategies. Slakter, Koehler, 
and Hampton (1970) presented informa- 
tion suggesting that TW has a develop- 
mental component. That is, students may 
become more "test-wise" as they grow 
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older. Generally, researchers have in- 
ferred extent of TW on the basis of. tests 
constructed specifically for this purpose. 

Students themselves were questioned 
recently about strategies they use to an- 
swer test questions. Haney and Scott (1980) 
administered a number of achievement 
tests to 1 1 students, then questioned them 
the following day concerning how they at- 
tempted to answer each item. These re- 
searchers developed*^ complex model in 
which responses to interviewer questions 
were classified into 46 separate-categories. 
Most of these categories included the use 
of some specific strategies such as guessing, 
elimination of alternatives, or Reason- 
ing." Their results indicated that children 
use a wide range of strategies in answering 
test questions arid that often a child's per- 
ception of item content bears Httle resem- 
blance to the intentions of the. test's au- 
thor. Haney and Scott concluded that 
considerable "ambiguity" exists in stand- 
ardized test questions, existing to a greater 
extent in science and social studies areas 
ao$l Jto a lesser extent in reading areas. 

vS Haney and Scott's work contributed 
significantly to our knowledge of the na- 
ture of ambiguous test items. However, the 
focus of their study was on tet construc- 
tion, with implications for the reduction 
of test item ambiguity. Although class- 
room teachers may use the results of Ha- 
ney and Scott to improve their own tests,' 
published standardized tests cannot be al- 
tered by teachers. A remaining question 
concerns the extent to which students em- 
ploy tesfcrtaking strategies when faced with 
difficult or ambiguous items. Do students 
use such strategies spontaneously (that-i^ 
without being trained)? If so, which strat- 
egies (if any) are effective in obtaining cor- 
rect answers? No previous research can be 
located to answer these questions. V 

To address these questions in the^res- 
ent study, the reading test performance of 
elementary school children was examined. 
Specifically, two areas were investigated: 
the strategies students spontaneously em- 
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ployed to answer reading test items and 
the relative effectiveness of these strate- 
gies in increasing reading test scores. 

Procedure 

A sample multiple-choice reading test 
based on items from the Stanford Achieve- 
ment Test (SAT) (MaddecujGardner, Rud- 
man, Karlsen, 8c Merwin 1973) was devel- 
oped and piloted on five students to 
evaluate whether the length was appro- 
priate and to establish reliable scoring con- 
ventions. This sample test included itjemsi 
from the Word Reading, Reading Com- 
prehension, Word Study Skills, and( Vo- 
cabulary subtests. After revisions had' been 
made, it was administered to 31 elemen- 1 
tary-age Caucasian students (15 girls, 16 
boys) attending summer classes in a rural 
western area. Students were selected from 
both remedial and "enrichment" classes so 
a range of abilities was represented. As as- 
sessed by the Woodcock Reading Achieve- 
ment Test (Woodcock 1973), 20 students 
read at or above grade level; 1 1 read below 
'their grade level. Most studer/ts (20) were 
second or third graders, but students were 
also selected from Grades 1 (two students), 
4 (two), 5 (five) >2 and 6 (twq)*. 

All students were seen 'individually by 
rie of four examiners. One'examiner in- 
terviewed 18 students, Whereas the other 
three interviewed two, four, and six stu- 
dents. First, students were given the Pas-^ 
sage Comprehension subtest from the 
Woodcock Reading Achievement Test in 
order to identify an approximate reading 
comprehension gra^ equivalent. Stu- 
dents were then given selections from the 
SAT one year level higher than their as- 
sessed grade leve'l on the Woodcock sub- 
test. In this manner, a similar difficulty level 
was provided for each student. Most stu- 
dents were able to answer correctly ap- 
proximately two-thirds of the test ques- 
tions. 

Students were then told to read aloud 
each test question (as well as the reading 
passages in the Reading Comprehension 
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subtest) and whichever of the distractors 
they chose to read. They were neither en- 
couraged nor discouraged from reading 
each distractor. As soon as students had 
answered a test question, they were asked 
to rate their level of confidence in their 
response: were they very sure, somewhat* 
sure, or not sure the answer they had given 
was correct? After students had finished 
each subtest, they were asked to reread the 
questions and tell the examiner why they 
had chosen their answer. The examiner 
recorded reading errors, confidence lev- 
els, attention to distractors, reference- to 
feading- passages, and reported strategies., 
Sessions were tape-recorded to clarify any 
later ambiguity in scoring'. Students spent 
45-90 minutes in the session and answered 
31-42 test questions. Some students re- 
ceived more questions than others because 
different levels of the SAT required dif- 
ferent subtests and formats. 

Results and discussion 
Effectiveness of strategies 

We found all krategy responses could 
be clarified within a 1 0-level hiermrchy that 
strong^ predicted the probability of re- 
sponding correctly. Proportions of correct 
responses were computed across subjects 
fof each type of strategy and are shown in 
figune 1. These classifications were as fol- 
lows: (a) skipped (student skipped the item), 
(b) misread a key word in question or dis- 
tractors, (c) used faulty reasoning (exam- 
ple: one student reported, "This word must 
be the correct answer because :it has a pe- 
riod after it"), (d) did not follow directions, 
(e) guessed, (J) "seemed right" (student 
thought the answer was correct without 
being able t'o' state an explicit reason)? (g) 
used external information (example: "I 
*know most people in fires die from breath- 
ing smoke because a fireman tpld me that"), * 
(h) eliminated inappropriate alternatives, 
(/) referred to passage, and (j) clearly 
"knew" the answer (example: "I know that 
a,pear is a kind of fruit"). The* existence 
of these strategies indicates that a com- 
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Fig. 1.— Percent correct answers by strategy 
used. Strategy classifications: 0, skipped item; 1, 
misread keyword; 2, faulty reasoning; 3, did not 
follow directions; 4,. "seemed right;" 5, guessed; 6, 
used external evidence; 7, eliminated; 8, referred 
to passage; and 9, clearly "knew." 

plete hierarchy of test-taking skills^e^Uts 
beyond simply knowing or not knowing the 
answers, and these strategies can be more 
or less effective on a standardized reading 
test. For example, as seen in figure 1, when 
students skipped an answer, nothing was 
correct; when they guessed, they got 37% 
correct; when they eliminated alternatives, 
they got 67% correct. Proportions of em- 
ployed strategies are given in table 1. 

We condensed these strategies. into five 
logical categories (skipping, procedural er- 
ror, guessing strategy," deliberate strategy, 
and "knowing") and computed point-bi- 
serial correlations for each subject. The 
median correlation between item score and 
reported strategy was .b^(p < .01), a cor- 
relation of moderate strength. 1 No differ- 
ential effects were seen by age, abHity level, 
or examiner; although the sample was too 
small to conclusively investigate these pos- 
sibilities. * 

Inspecting figure 1 reveals some other 
interesting findings/The high proportion 
of correct scores for guessing is notable. 
Since the number of answer choices varied 
between subtests and levels, with four 
choices the most common format, the 
probability of responding correctly by 
chance alone was' estimated at .28. In fact, 
when students reported guessing, they 
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Table L Frequencies (f) and Percent (%) of Strategies Employed 



% 



0. Skipped item 

1. Misread keyword 

2. Faulty reasoning 

3. Did not follow directions 

4. '•Seemed right" 

5. General 

6. Used external evidence 

7. Eliminated 

8. Referred to passage 

9. Clear ly -knew" 



9 
23 
38 

7 
92 
127 
21 
45 
59 
458 



1.0 
2.6 
4.3 
.8 
10.5 
14.4 
2.4 
5.1 
6.7 
52.1 i 



scored 37% correct. "Guessing" responses 
scored virtually the same as "seemed right" 
responses, suggesting that even when stu- 
dents believe they are guessing, they still 
have some klea of what the correct answer 
might be and can use this strategy to ad- 
vantage/ "Seemed right" responses were 
common on the vocabulary subtests in 
which students often reported that a par- 
ticular definition sounded correct but were 
otherwise uncertain. Another interesting 
finding is the high proportion of correct 
responses when the students reported us- 
ing outside information or experience. Al- 
though content area tests, such as science 
and social studies, directly test outside 
knowledge, reading tests ostensibly are in- 
tended *9 test nothing besides knowledge 
of the passage's content. Therefore, al- 
though use of outside information should 
not help, students dfdbenefit from the use 
of such information (however, when stif- 
dents referred to the passages they scored 
even higher). The students' ability to use 
outside information as effectively as they 
Bid is surprising. This finding underlines 
the ^passage independence" problems of 
reading comprehension items, a topic well 
investigated by researchers such as Tuin- 
man (1973-74). 

Level of confidence 

Students had a reasonably good idea of 
whether they had answered a test question 
correctly. When student* reported being 
"very sure" their answer was correct, they 



were correct 81% of the time. When they 
reported being "somewhat sure," they 
were correct only 13% of the time, and 
when they reported being "not sure," they 
obtained correct answers only 7% of the 
time. However, these figures are some- 
what misleading. The results seem differ- 
ent if looked at another way: when stu- 
dents answered incorrectly, they also 
reported being "very sure" the answer was 
correct in 56%,of the cases. Clearly, al- 
though related to performance, level of 
confidence in itself is not a sufficient check 
on correctness of a student's work. The 
relation between confidence and correct- 
ness of response was seen to vary widely 
from student to student, with a median • 
point-biserial correlation of .29 (/>> .05)N^ 
Therefore, in many cases, other means are 
necessary for students to assess the cor- 
rectness "of their resppnse* These meaps 
will be described below. 

4 * t 

The cost of carelessness 

In addition to reported test-taking 
strategies, information was also collected 
on the degree to which the students at- 
tended to distractors and chose their an- 
swers by referring to the reading passage 
on the Reading Comprehension subtest. 
Results showed that students rarely re- 
ferred to the reading passage; even though 
when they did, they stood a very good 
chance of, answering the question cor- 
rectly. In 89% of the cases where students 
answered a reading comprehension ques- 
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tion incorrectly, they had not referred to 
the passage that clearly contained the cor- 
rect answer. Of course, this does not mean 
that all of these questions could have been 
answered correctly had students referred 
to the passages, but it does appear that 
reading scores could be greatly improved 
by students' increased attention to the pas- 
sages. 

Similarly, a great deal of carelessness * 
was observed in attention to distractors. 
When students answered incorrectly, in 
40% of a the 302 cases they had not read all 
distractors. Again, this finding does not 
mean all these questions could hav£ been 
answered correctly by greater attention to 
distractors, but students could almost cer- 
tainly have improved their scores by doing 
so. When students answered "questions cor- 
rectly, they had attended to all distractors 
in 73% of the 577 cases. It does appear, 
then, that test performance can be im- 
proved through greater attention to-dis- 
tractors\^ * 

Another surprising finding was the rgj- 
atively small effect 6f reading errors. Al- 
though performance was clearly impaired 
when students misread a word of key im- 
portance (see fig. 1), m general misreading 
words was less detrimental than might be 
expected. When students misread one or 
more words in stem or distractbr, the pro- 
portion of items answered correctly (58% 
of 293) was still quite high. Clearly, many 
students Kavfenl^veloped strategies for cop- 
ing with words tnty cannot read 4 It seems 
important to remind students not to "give 
up" if they cannot read every word. As the 
present investigation indicates, students are 
often able to answer correctly even though 
they cannot read every word. 

One final findifig concerning careless- 
ness can be reported. All examiners noted 
the extent to which students had acted on 
the wrong stimulus - in the "word study 
skills'* subtest. In this subtest, students are 
given a word with an underlined sound and 
asked to find the same £ound in one of 



th ree distractors. The following problem 
provides an example: 

Prize 

(a) prince 

(b) size 

(c) seven 

The correct answer is b because the z 
in "size" has the same sound as the under- 
lined z in "prize." Wha* was surprising to 
' us is that students often attended to the 
wrong stimulus, for example, the initial pr 
in the above question. Although the exact 
.incidence of these errors cannot be given, 
their consistent occurrence seems to imply 
that teachers should stress the importance 
of attending to the underlined sound only. 

Conclusions v 
The results of this study demonstrate that 
students do employ specific strategies to 
cope with test item ambiguity, indecision, 
or lack of knowledge in selecting correct 
answers. These findings have important 
implications directly bearing on student 
performance during testing. To attain the 
most correct answers, students should em- 
ploy the strategies listed below: 

1. ' Be certain to attend to all distractors 

and refer to the reading passage, even 
, if you are "very sure" your answer 
is correct. 

2. If you are having great difficulty 
reading a passage, read the questions 
and try .;o answer them anyway. 
Often, your own knowledge can help 
you choose an answer. If you have 
difficulty with some words in the 
question or distractors, answer any- 
way and base your answers on the 
words you can read. 

„ 3. If you have attended to all parts of 
a passage and test question and still 
do not know an answer, there is still 
a good chance of getting the correct 
answer if you guess. 
4. Be certain you are attending to the 
appropriate stimulus, such as the 
.underlined sound in a "word study 
skills" subtest. As in other subtests, 
wrong answer choices may look cor- 
rect at first glance, 
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5. Make sure you answer every item. 
Even if you must hurry and guess fre- 
quently near the end, you will prob- ** 
ably get some of the answers correct. 

Considering the results of past research 
(Bangert, Kulik, & Kulik 198$), it is likely 
that to affect test performance signifi- 
cantly, a teacher will have to do n\ore than 
simply read, the above points to students. 
Examples and practice activities wrll help 
students develop these test-takipg skills. 

These findings should be of interest to 
special education teachers, particularly 
those in the area of learning disabilities. 
Many children are referred for special class 
placement^n the basis of deficiencies in 
standardized reading-test scores. Special 
education often is quite beneficial tostu- 
dents who clearly need it, but before tak- 
ing such a dramatic step, teathers should 
be certain that the test "score reflects the 
best abilities of the student rather than a 
problem with test taking in general. 

The present investigation indicates that 
a range of abilities exists in test-taking skills, 
as it does in other areas. If tests are.to be 
as valid as possible, the specific skills ob- 
served in efficient students taking a read* 
ing test should be practiced by all students. 
If test-taking skills are incorporated in 
general test-administration procedures, it 
appears maximum benefit can be derived 
from the use of standardized reading tests. 



Notes 

The authors would like to thank Dr. Ginger 
Rhode and Judy Johnson, as well as Dr. Jay 
Monson, acting director, and the staff of the 
Edith Bowen School, particularly Dorothy Dob- 
son and Lou Anderson, for their valuable as- 
sistance with this ^project, The authors would 
also like to thank Ursula Pimentel land Marilyn 



Tinnakul for typing the manuscript. Address 
requests for reprints to Thomas E. Scruggs, Ex- 
ceptional Child Center, UMC 68, Utah State 
University, Logan, Utah, 84322. 

l A point-biserial, rather than a Spearman 
correlation of ranks coefficient, was computed * 
out of concern for the necessarily high number 
of ties resulting in computing a rank correlation 
with binary data. However, the obtained Spear- 
man coefficient of .55 differed by only^one point 
from the obtained point-biserial coefficient of 
.54. 
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Abstract 



Developmental Aspects 



Twenty-eight students from grades 1 through 5 were administered a test of 
test-wiseness for *abs4i£d options. Results suggested that a developmental 
trend ftiay exist in test-wiseness for elementary- age school children. - ' 
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2 * 

Developmental Aspects of Test-Wiseness for Absurd / 

/ 

/ ^ 

Options: Elementary School Children 

*■ ,■ / 

F-irst discussed by Thorndike in 1951, test-wiseness (TW) was ^escribed 

i 

in detail by Millman, Bishop, and Ebel (1965), ^nd defined as "a subject 1 s 
capacity to utilize the characteristics and foVmats of the test and/or testr 
taking situation to receive a high score" ,(p. 707). They further described 

- t 4 

TW as ^ logically independent of the examinee's knowledge of thte subject 
/matter for which the items are supposedly measures" (Millman et'al., 1965, 
p. 707). Ebel (1965) has suggested that error in measurement is more likely 
to be obtained from students low in test-taking skills. The student low in 
TW, therefore, may be more of a measurement problem than the student high . in 
TW (Slakter, Koehler, & Hampton, 1970b). ^ 

Some investigations have indicated that TW has a developmental 
component; that is, that TW increases with age. Slakter, Koehler, and 
Hampton (1970a) administered a measure of TW to students from grades 5-11 
"*and found a significant overall .linear trend for grade level. Crehan, 
Koehler, and Slakter (1974) "administered a TW test to students in grades 7 
through 11, and a follow-up t^st to the same students two years later, 
increases over all intervals except grades' 9 to 11 were found. In a second 
follow-up of the same students, Crehan, Gross, Koehler, and Slakter (1978) 
replicated the previbus, findings and concluded that although TW increases by 
grade, large individual differences exist within grade levels. 

Although the above investigations provide strong support for* a 
developmental component of TW in the secondary grades, as yet no * - 
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.investigation has evaluated t£e developmental, nature of TW in the elementary 
grades. The present invest igatiofr* is intended to address this question. 

v Method < 

Subjects were 28 elementary school -age children attending summer 
classes prior to entering grades 1 through^ in a western rural community, 
*Sti!idents. (1 first graWe^, 9 second graders, 11 third graders, 2 fourth 
graders, and- 4 fifth graders) were selected ^from both remedial and 1 3 
"enrichment 11 , classes so that a* variety of abi lity^evels was sampled. 

Students were seen, individual ly by one of four examiners, ^irst, they 
were administered a five-item test of TW, This test was developed to , * 
measure the ability of students, to eliminate options knownjp be incorrect 
(corresfjpftfiing to the Millman 'et al., 1965 TW category I-D-l, absurd 

, r , 

opti;ams). For example, one of the items Was the following: 



Good airplane pilots must be able fo ^ 



J\ ■ i \ 

-v quickly in an emergency. \ \ 

1. fall asleep t 3. sturrt^te 

2. scream r 4. thing. 



Students were orally provided with wordsthey were unable tfc read. Sin ce it 
(as thought that' evidence of TW would be more subtle in an- elementary school 
peculation than it was in studies of secondary students, some departu-pgs 
were^ade from the procedures of Crehan et al. (1974). First, students were 
direct lys^uestioned regarding the reasons^ for their answer choices following 
completion of the test. Second, students were scored as reporting no 
elimination strategies (0), or reporting one or more strategies (1), 
regardless of the "correctness" of their answer tp each test question. 
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Results and Discussion 

A point-biserial correlation was computed between entering grade level 

of student; and presence or absence of reported elimination strategies. The 

resulting coefficient, .44, was statistically significant (p < ,02) and 

represented a moderate relation between grade level of student and reported 

use of. elimination strategies, accounting for approximately 20% of total 

variance. .Proportion of studep^s reporting use of elimination strategies by 

grade level is given in Figure 1. 

% ( • 

3 , » 



Insert Figure 1 about here % ^ 



Thus, it appears that -a developmental trend in one aspect of TW can be 
.Observed in children of el ementap^ school age, and that this trend is 
similar to that seen in olderf students. These findings must be interpreted 

with ^caution , however, duetto the limited sample size, as well- as the fact 

^ . * . ' 7 

'that only one aspect #f TW'was measured. Although further research is 

t 

needed, the results of this preliminary investigation suggest that students 
begin to learn TW skills as* early as the primary grades, and that these 
skills continue to improve" with age. 
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Monson and the staff of the Edith Bowen School for their- assistance on this 

K * 

project. 
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Figure 1. Proportion of students reporting elimination strategies by 
grade level S - - ■ 

4 



\ 



APPENDIX D 



0 



ERIC 



51 



Perceptual and Motor Skills, 1984, 58, 945-946. © Perceptual and MotowSkills 1984 

< * v V ' 

PASSAGE INDEPENDENCE IN READING ACHIEVEMENT 
TESTS: A FOLLOW-UP 1 [ 



u 



STEVE LIFSON, THOMAS E SCRUGGS, AND KARLA BEJINION 
Utah State University 

Summary.— -38 college undergraduates were administered reading-compre- 
hension items from a major standardiied achievement test with corresponding 
passages deleted. Analysis indicated thar, after 20 years of similar research 
findings, highly passage-independent items still occur on major tests. 

For almost 20 years, it has been documented that reading-comprehension 
test items can be answered correctly at above-chance rates without actually 
reading the relevant passage (Preston, 1964). Pyrczak (1976) mentions 
several , types of items which seem particularly independent of the passage. 
These types include (a) items that can be answered Jrom the examinee's own 
knowledge and (b) items about a partic iar passage that are related to each 
other in such a way that some items provide clues for other items. Reading- 
comprehension tests which include such items invite critical attention on the 
grounds that (a) examinees may have an advantage over chose not using these 
strategies (Pyrczak, 1972) and (b)| if a subject uses these principles and 
skips passages, he invalidates the purpose of the test (Tuinman, 1973-1974). 
Since an extensive review of the literature has shown no justification for the 
use of passage-independent items, the question arises as to whether these items 
still occur in commonly used standardized achievement tests. ' The present in- 
vestigation was intended to determine whether such items are still in use.' 

Method 

Subjects and Materials 

Thirty-eight undergraduate elementary education students at a western 
university completed 16 multiple-choice reading-comprehension questions 
without the accompanying passages, Trie items selected were thought to rep- 
resent questions that could be answered without having read the accompanying 
passage. These items were chosen to correspond to Millman, Bishop and Ebel's 
(1965) categories of test-wiseness ; strategies involving the general knowledge 
of the test taker and use of subject matter of neighboring items. The specific 
effects of these cues, however, were not addressed in this study. The 16 items 
were taken from the Stanford Achievement Test Form E, Level P-3, from >a 
pool of 60 items. The items were! kept in clusters illustrating which belonged 
together in terms of association with a particular passage. * 

*The authors thank Dr. Barnard Hayes for his kind and generous assistance with this 
myesti»tion. Requests for reprints should be addressed to Steve Lifson, Exceptional 
Child Center, UMC 68, Utah State University, Logan, Utah. 
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Procedure 

The materials 
ing. The i 
comprehen: 
will answer 
not know 



sion 



mean score 
firmed that 



S. UFSON, ET AL. 



were distributed to two sections of a class in teaching read- 
tudents were told: 'Today I'm goings to give you some reading- 
test items without the passages. It is not expected that you 
all of the questions correctly; just do your best. Guess if you do 
answer.' 6 No time limit was imposed upon the task. 



the 



Results and Discussion 
Analysis indicated that the mean score was 75% correct, with an average 
of 11.9 of the 16 items. A one-sample / test (Hays, 1973) con- 
the obtained scores were significantly different from chance re- 
sponding (i = 18.9, p < .001). 

xAkhough the items were not randomly selected for this measure, they 
nevertheJesi represented '25% of the items, included in' the reading-compre- 
hensipn lection of the test. Clearly, at least spme test developers have done 
lizzie to alter passage-independent items in light of the research findings of 
almost two decades. While the effects of the readers' previous knowledge 
cannot bj eliminated, the effects could be minimized by the use of fictional 
material /for the passages with accompanying questions about the activities of 
an> imaginary person. In spite of the reported validity of these items (SRA, 
1979), /the burden qf construct validity rests with the authors of the tests. If 
some students are able to answer "reading-comprehension" test items correctly 
without reading the passage, one can question what is being measured. 
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Reading Comprehension Tests 

2 

Abstract 

Previous research has indicated that students in many cases can 
answer readi ng comprehensi on 'test questi ons correctly wi thout 
having read the accompanying passage. The present research, com- 
pared, in two experiments, the ability of learning disabled (ID)- 
students and more typical age peers to answer such reading compre- 
hension questions presented independently of reading passages. In 

■ \ 

Experiment 1, LD students scored appreciably lower under condi- 
J[ 

tions resembling standardized administration procedures. In 
Experiment 2, reading decoding ability was controlled for; how- 
ever, the performance differential remained the same. ^Results 
suggested a relative deficiency on the part of LD students with 
respect to reasoning strategies and test-taking skills, In addi- 
tion, the validity of some tests of "reading comprehension" was 
discussed. 
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'3 

Are Learning Disabled Students 'Test-Wise?": 
An Inquiry into Reading^Comprehension Test Items 
For many years, there has been some argument over what 
reading comprehension tests "really" measure (e.g., Thorndike, 

as 

1973-1974). The most commonly observed standardized reading 
comprehension item format consists of a passage and 3 number of 
associated^multiple choice questions. Reading and understanding 
the passage is assumed to be a necessary pre-condition to 
correctly answeringythe questions. After examining the 
literature, however, one is forced to question the assumption of 
question dependence ..on .the stimulus passage. Preston (1964) found 
that college students were able to answer reading comprehension 
items with the passages blacked out at a rate significantly above 
chance. Tuinman (1973-1974) administered five major tests to 
9,451 elementary-level students under several condi tions . 
Students in the no passage condition (relevant passage*had been 
blacked out) on the average achieved only 30% fewer correct 
answers tham^ subjects in the passage-in conditions Similar 
results were obtained by Pyrczak (1972, 1974, 1975, 1976) and 
Bickley, Reaver, and Ford (1968). A follow-up study of passage 
independence by Lifson, Scruggs, and Benhion (1984) revealed that 
passage-independent items are still quite common in elementary 

level achievement tests... College undergraduates were able to \ 

\ 

answer 75%, or almost 12 of .16 questions on the Stanford 
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Achievement test, Level P-3, without reading the associated 
passages. This score is considerably above that expected by 
chance responding. 

Scruggs, BeQnion, an d Lifson (1985) interviewed elementary 
age students regarding their responses on a reading comprehension 
test. They found that students often chose their answers based 
upon their own prior knowledge, rather than content of the reading, 
passage. When students reported using such prior information, 
they answered correctly in over &Q% of the cases. 

Reading comprehension items which are*independent of the 
associated passage can be answered on the basis of the following: 

r 

(a) general knowledge, (b) interrelatedness of the questions on a 
particular passage, and (c) faulty item construction, i.e., keyed 
option is twice as lpng or more precisely stated (Pyrczak, 1975)'. 
In the first two cases, the presence of enough information in the 
question stem to identify the topic is an important factor (e.g., 
"Which of the following statements in NOT true of penguins?"). 
Such a stem may render a question answerable in terms of 
information already available to the'ekaminee and provide clues to 
the answers of related questions about the same passage that lack 
such information in the stem ("This passage is about: a) birds of 
South America, *b) birds of the Antarctic .... etc."). The cues 
which individuals, apply to a testing situation to maximize their , 
score correspond to'Millman, Bishop, and Ebel is (1965) criteria of 
test-taking skills, or "test wis&ness." » 

\ 
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While test constructors may be able to point to hi^h validity 
coefficients for their reading comprehension tests and subtests, 
an important question arises concerning whether al 1 students are 
equally able to answer questions with the above mentioned 
characteristics without reading the passage. Are some groups of 
students at a relative advantage/disadvantage in ability to answer 
these questions without reading the passage? To answer this 
question, a group of students classified as learning disabled (ID) 
and a gr^pup of regular classroom students were administered a* 
selection of multiple choice reading comprehension questions with 
fhe relevant passages removed, ^he conditions of this experiment 

were meant to resemble those of a normal- testing si tuation — i .e. , 

I 

students were required to read the questions without assistance. 
% This did not permit us to determine the extent to whi^h any 
observed differences between the*regular and LD students were due 

to reasoning or variations in general knowledge between the.two 

i 

.groups or simply reflected a" difference in reading ability. To 
address this 'issue, a second experiment was performed to see if 
similar differences could be found when word reading was 
controlled for. 

Experiment 1 
Method 

Subjects and Materials 

Subjects consisted of 67 regular classroom and resource room 



59 



/ 



\ f 

Reading Comprehension Tests 



6 

third grade students selected from several elementary schools in a 

western rural area. Of these subjects, ^2 were regular classroom 

students and 15 were classified as LD by P.L. 94-142 and local 

criteria, which included a 40% .discrepancy between actual and 

expected performance in two areas of academic functi oning. t The 

average grade equivalent of the total reading score of the non-ID 

students on the Comprehensive Test of Basic Skills (CTBS) was 3.4 

(SD=.8), while the average CTBS total reading score for the ID 

students was. 2.1 (SD = .5). ^ 

Fourteen multiple choice reading comprehension questions 

Jr 

without the accompanying passages were selected for this task. 
I Items were drawn from the Stanford Achievement Test, Level P-3, 
Form E (1982), Items had been chosen to represent questions 
thought by the author to be answerable in terms of: (I) the 
general knowledge of the test taker, and (b) the degree to which • 
the interrelatedness of the items served as a cue to the answers. 
These items were taken from the Lifson et al. (1984) ^udy, in 
which students 1 ability to answer these questions had been 
documented. The items were kept in clusters which belonged f 
together in terms of association with a particular passage. 
Procedure . ^ 

Treatment was administered in regular instructional 
groupings. Materials were passed out and all students were told 
that they were about to take a reading test for which they would 
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not be shown the accompanying reading-passages, but that they 
should try their best to answer all questions. No time limit was 
•imposed upon the task. f 

Results and Discussion 

<i 

The regular classroom group answered correctly approximately 
"55% of the questions, for mean score 9^7.8 (SD=1.96). This score 
was significantly above a chancfe score of 3.5 (tQ02) = 11.27, 
p<.001). In contrast, the LD students answered correctly onl^ 35% 
of the q^^tions, for a mean score of 4.9, only slightly higher 

V. 

than chance (t(£8) = 1.77, ns )^ Jhe obtained score of the non-ID 
group was significantly higher than^the LD group (jt (65 ) = 4.91, 
p<.001). 

The present findings suggest that regular classroom students 
are able to recognize an^make use of cues in testing situations 
in order to increase their scores , even when reading passages are 
deleted, and "reading comprehension" supposedly cannot be 
measured. Apparently, LD students are not^le to benefit equally 
from these cues. Since neither group should have scored above 
chance on a reading comprehension test with thereading passages 
deleted, it is possible that a certain amount of bias exists 
against children with learning disabilities on some standardized t 
tests of reading comprehension. Students in regular cilasses Vhen 
unable to read or otherwise obtain meaning from reading passages 
are still able to answer correctly comprehension questions. 
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< 

Students with learning d1sab\lijies , however, do not seem to have 
these skills, and are thereby punished twice for a reading 
handicap: Once for being less able to read and comprehend the 
passage, and a 'second time for being unable to "second guess' 1 test 
questions, as their nontiandi capped peers are apparently able to 
do. 

One possible explanation for thi^ di screpancy between LD and 
regular classroom students is that LD students are simply less 
able to read (decode) the questions, and for that reason are less 
able to outguess the test. That is, LD students are less 
deficient^ "test taking skills" than they arepn reading 
ability. In order to address this question, a second experiment 
was designed, in which ability to read would be controlled for% 

! N-N 

Although the conditions in this experiment could not parallel 
those of standardized test procedures, they did allow for an. \ 
assessment of the extent to which differential scores are 
attributable to generally lower reading skills. 

Experiment 2 
•Method 

Subjects and Materials 

The 42 subjects who participated in this investigation were 
different students drawn from the same population as those of 
Experiment 1, and consisted of 27 regular classroom third grade 
^students and 15 third grade children classified as LD by P.L. 
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94-142 and local district. cri teria. Mean grade equivalent for the 
non-ID group (CTBS total reading), was 3*6 ' (SD=. $) ., and \A (SD=.4) 
for the >LD group •\^at;erials were. 14 items drawn from the 
Stanford Achievement Test, level P3, Foihn F, and were chosen on J 
the same basis as those used in Experiment 1. Pages of the test\^ 
were again left intact with questions left in the original order y 
and the passages themselves . blacked out during* the copying 
process, T 
Procedure ■* \ 

Students were informed by their teacher that they were about 
to take a reading test without reading the corresponding passages. 
They were told to4isten while the teacher read each item, and 
then answerthe items. All students were given sufficient time to 
answer al 1 questions. 

Results and Discussion 
The students in regular classrooms answered correctly 65% of 
the fourteen items, for a mean score of 9.14 (SD=1.8). The ID 
students, on the other hand, answered correctly only 45% of the. 
items, for a mean score of 6.33 (SD=1.8). Although both obtained/^ 



scores are well above chance, (t(52) = 12. 02, and Jt < 28) * 4.325, 
ps<.001, for regular classroom and LD students, respectively), the 
regular classroom group maintained its advantage'over the LD 
students, U40)=4.87, p<.001. The results suggest that learning 
disabled students may be less likely to apply test-taking - 
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strategies to reading^ comprehension questions to a degree of 

. effici-ency similar to their Ron-ID counterparts. 

General Discussion 
In Experiment 1, regular third grade classroom students were 
seen consistently to outscore their *LD couiiterparts on a test of 
reading comprehension questions with corresponding passages 
depleted, and administered under conditions resembling standardized 
testing procedures. In Experiment 2, regular class third graders 
again outscored LD students, under condi ti ons for which reading 
ability was controlled. The ability of third grade children in 
these cases. to score 55% and 65% correctly on questions which 
refer to non-existent passages seems remarkable, and brings into 
question the issue of what some tests of "reading comprehension" 
are really measuring. Such passage independent items have been 
thought to assess test-taking skills and in fact hav^een used 
as measu?es of "test-wiseness" (e.g., Derby, 1978). Although it 

. is suggested that differences in the use of test-taking strategies 
(such as use of prior knowledge, deductive reasoning, and 
elimination of implausable options) were responsible for much of 
the observed performance differences, other explanations are 
possible. Factors such as oral language decoding ability, 
attentional deficits, and test anxiety may have played a part in 
inhibiting performance on ;the part of the LD students. The role 
of these other factors in ID test performance is currently being 
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investigated by the present authors (Scruggs, Bennion, & Lifson, 
1984; Taylor & Scruggs, 1983) « Whatever such tests are seen to 
measure, however, it is clear that: (a) it is not "reading 
comprehension," and (b) children classified as LD are at an 
apparent* disadvantage. 

An argument can be made that these comparisons are of trivial 
importance, since in standardized test administration, passages 
are not deleted; that all children in fact have equal access to 
passages which contain answers to reading comprehension questions. 
Although this argument has a certain face validity, some problems 
remain. First, since non-LD students can score so high on such 
items without reading the passages, the extent to which scores are 
a direct measure of "reading comprehension" seems uncertain. 
Second, since nearly all such tests are timed, students with 
incomplete understanding of relevant passages, but possessing an 
ability to "outguess" test questions under time constraints, 
clearly are at an advantage with respect to students not 
possessing such an ability. In this case, differences in scores 



on reading comprehension tests may in fact reflect in part a bias 
toward students with superior ability to respond to specific cues 



experiments, LD students may well find themselves on the negative, 
side of any such bias. 

The extent tG which ID and their non-LD counterparts differ 
on the present measures appears to have surprisingly little to do 



in the test-taking situation. 




present 
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with reading ability. Although both groups gained when reading* 
(decoding) ability was controlled for, each group was seen to 
exhibit the same degree of gain, amounting to about 10 percentage 
points for each group. Reported t values in Experiments 1 and 2 
remained virtually identical. It seems clear, then, that much of 
the observed performance difference in Experiment 1 was due to 
skills other than reading ability, dr "reading comprehension." 

Two'steps may be taken to help alleviate this potential 

I 

source of bias. First, achievement tests should be revised so 
that reading comprehension tests directly assess comprehension of 
the provided passage. In fact, an informal review by the present 
authors of the major achievement tests indicates that many 
achievement test questions appear to be much less "passage; 
independent 11 since the work of Tuinman (1973-1974) and others of a 
decade ago (Scruggs & Lif son 7 1985) . Second, it seems possible 
that at least some of these /"test- taki ng skills" can be trained, 
and that this training may jdo much to correct this apparent 
disadvantage. The authors/ are at present investigating the 
effectiveness of such training (Taylor & Scruggs, 1983) and 
initial findings have been positive (Scruggs & Mastropieri, in 
press; Scruggs & Tolf a,7 1985) / Although such improved scores on 
tests may not necessai/ily reflect increased achievement* these 
scores could reflect/more accurately achievement gains students 
have made, as evaluated by standardized achievement tests. 
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LEARNING DISABLED STUDENTS' 
SPONTANEOUS USE OF 
TEST-TAKING SKILLS ON 
v READING ACHIEVEMENT TESTS 

^__Thomas E. Scruggs* Karla Bennion, and Steve Lifson 

ABstract. The present Investigation was undertaken to Identify tye type of 
strategies learning disabled (LD) students employ on standardized* group- 
administered achievement test iterrjs. Of particular interest was level of strategy 
effectiveness and possible difference sin strategy use between LD and nondisabled f 
students. Students attending resource rooms and. regular third-grade classes were 
administered i(ems*from reading achievement tests and Interviewed concerning , 
the strategies they had employed in answering the questions and their level of con- 
fidence in each answer. Results indicated that (a) LD students were less likely to *~ 
report use of appropriate strategies on inferential questions, (b) LD'stud^nts were 
less likely to attend carefully to specific format demands, and (c) LD students^ „ 
reported Inappropriately high levels of confidence. 



Since the seminal article by Millman, Bishop, 
and Lbel in 19.65, attention has been focused on 
test-taking skills, or test-wiseness, as a source of 
measurement error in group-administered 
achievement tests (Sarnacki, 1979). Defined as 
"a subject's capacity to utilize the characteristics 
and formats of the test and/or the test-taking 
situation to receive a high score" (Millman et al. ( 
1965. p. 707), test-wiseness is said to include 
such diverse components as guessing, time-use, 
and deductive reasoning strategies. Given that 
the effective use of such strategies may have little 
relationship to a particular academic content 
area, individuals or groups of individuals lacking 
in' these skills may be at a disadvantage. A 
recently completed meta-analysis, for example, 
suggested that under certain circumstances, low- 
t SES students are more likely to benefit from 
achievement test coaching than higher SES 
students — a finding which implies that low-SES 
students are relatively deficient in test-taking 
skills (Scruggs, Bennion. &' White, 1984). 

The present investigation was concerned with 
learning disabled (LD) children's spontaneous 
use of such strategies. Part of a larger investiga- 
tion involving test-taking skills of exceptional 
students (Taylor & Scruggs, 1983), this study 
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was conducted to identify possible deficits in test- 
taking skills on the part of LD children.. Such 
deficits, if uncovered, would be helpful in 
developing remediation techniques. 

Although much research has been conducted 
on nonhandicapped populations' test-taking 
skills (See Bangert-Drowns. Kulik, & Kulik, 
1983; Sarnacki, 1979; and Scruggs, et. a).. 
1984, for reviews), little is knCwn about LD 
students' test-taking skills. Scruggs and Lifson 
(1984) recently investigated LD students 1 dif- 
ferential ability to answer passage-independent 
reading-comprehension test items (i.e., reading- 
comprehension test items for which relevant 
.passages had been omitted). Items were taken 
from standardized achievement .tests known 
from previous research findings to be answerable 
by individuals who had not read the associated 
passage (Lifson & Scruggs. 1984), and thought 
THOMAS £. SCRUGGS, Ph.D., is Resedkh 
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to be a good measure of test-wiseness. In two 
experiments, nonhandicapped children scored 
55% and 65% e&fStct on such items, where 
students from (ho .i;jmci fjrade scored mu 
lower, even when word reo'ling ability was con- 
trolled. Scruggs and Lifson (1984) argued that 
3 uch findings also raised the question of what 
reading-comprehension tests do measure since 
no reading-compreherfsion test items should be 
answerable without prior reading of the 
associated passage. Scruggs and Lifson conclud- 
ed that LD children may be at a relative disad- 
vantage with respect to such test-taking skills as 
guessing, elimination, and. deductive reasoning 
strategies applied to response items. 

Scruggs, Bennion, and Lifson (in press) 
employed individual interview techniques to 
determine the t nature of the strategies 
elementary-school children spontaneously pro- 
duced on reading-achievement tests. Students 
representing a wide range of age and ability 
ievel^Avere given reading-achievement test items 
apfJropriate to their individual reading levels. 
Results indicated that students employed a wide 
range of strategies far beyond simply knowing or 
nc5t knowing the answer, and that the use of 
these strategies was strongly predictive of perfor- 
mance. These findings provided valuable 
general information about the manner in which 
children respond to reading-achievement test 
, items However, the diversity of the population 
in age and achievement level was thought to 
have obscured observation of specific differences 
in test-taking skills between age or ability levels. 
- The present investigation, therefore, was intend- 
ed to determine whether LD and nondisabled 
students differed in strategy use on reading- 
achievement tests. In this investigation, grade 
level was held constant and the number of 
subtests was reduced to two: a reading- 
comprehension subtest, in which direct referring, 
elimination, and deductive reasoning strategies 
were thought to be important; and a letter-sound 
subtest, in which close attention to format 
demands was considered essential. In addition, 
since level of reported confidence was found to 
be a strong predictor of performance (Scruggs, 
et a! , in pr£s\), and a prerequisite to strategy 
monitoring, confidence reports were examined 

for possible differences between ability groups, 



Method 

Subjects 

Subjects were 32 third-grade students attend- 
ing public schools in a Western universitytom- 
munity. Twelve subjects were classified as learn- 
ipg disabled (LD) according to local school 
district criteria, which included a 40% discrepan- 
cy between ability^ and performance in two 
academic areas ahd PL 94-142 regulations. 
Twenty subjects --were regular-class students, 
none of whom had been referred for special ser- 
vices or were considered by their teachers to be 
function at the highest achievement levels. 
Although the LD and regular-class students at- 
tended different schools, the schools were adja- 
cent, drawing their populations fr&m the same 
middle^class community. None of the students 
'qualified for their schools' free lunch program. 
General cognitive ability appeared to be similar 
for the two groups. Mean Full-Scale IQ for the 
LD students {Weschter Intelligence Scale for 
Chitdre-Revised) was 92,75 (SD = 5/7). Mean 
Cognitive Skills Index' for the non-LD students 
(Test of Cognitive Skills) was 96.16 (SD = 9.5) 1 
Mean grade equivalent for reading comprehen- 
sions on the Comprehensive Test of Basic Skills 
(CTBS) for non-LD subjects was 3.9 (SD = .89), 
equivalent to a percentile score of 61. For LD 
stud nets the mean CTBS reading- 
comprehension grade equivalent was 2.3 
(SD = .29), equivalent to a percentile score of 
21. The 16 boys and 16 girls constituting the 
sample were all 8-9 years old and Caucasian. 
Sex was evenly represented both in LD and non- 
LD groups. 
Materials 

Two reading tests were constructed from items 
taken from the Stanford Achievement Test, Test 
items were drawn from the Primary 2 battery for 
* the instrument used with the LD group, whereas 
the Intermediate 1 level served as the source for 
the regular classroom group. Each test contained 
three reading passages with 14 dependent-ques- 
tions (10 content, 4 inference) on each form. 
Comprehension questions were left in their 
original order in relation to the selected passage. 
Questions were renumbered to avoid gaps 
.where passages did not follow the sequential 

order of the original test. In addition, three Items 
from the letter-sound test (level P3) were 
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selected. These consisted of n stimulus vMord in 
which a letter or letters were underlined tolrepre- 
sent a sound that the student ,yvas to identify « 
amcng three options given beldw the stimulus 
word. Tlv?se items* served r,s extractors that 
closolv matched the initial consonants of the 
stimulus word. For example, in the item: 
blind 

0 blink - 

0 nibble 

0 leaned 

leaned is the correct answer, since it contains the 
same sound as the underlined I d in the stem; 
blink \s the distractor. containing the same initial 
consonant blond. 
Procedure 

Subjects, seen Individually by one of two ex- 
aminers, were asked to read the passages and 
qu*jsnons aloud and mark the answers they 
thought were correct. Students were then told 
that they would be asked to state if they were 
suiO not sure that the selected answer was cor- 
rect, and the manner in which they had chosen 
the particular answer. Subjects responses to the 
questions, "How did you choose that answer?" 
and "Are you fure or not sure of your answer?" 
were recorded verbatim on the protocol. Words 
the experimenters had previously deemed 
essential to answering the questions (key words) 
( were marked in the examiner's copy of the in- 
strument, snd errors in these words were noteU 
as the child read aloud. 
Scoring 

Test items were scored for correctness, con- 
fidence in answer (sure/not sure), and type*.of 
strategy reported. Two students from the non- 
LD group, who had misread more than 25% of 
the key words, were excluded frqm further 
analysis. The responses were divided into seven 
categories: 

1 = Didn't know 

2 =s Guessed 

3 = External source of knowledge (e.g., "1 

know all fish have scales") 

4 = Referred to passage (e.g., "I read it") 

5 » Quoted directly (e.g., "It says here that...") 

6 = Eliminated options known to be incorrect 

7 = Other reasoning (e.g., "It said comforted in 

the story. That sort of means relieved.") 
Each response was evaluated in terms of the 

iovi.: 1 , ca:ecjorl,tfs. Percent of 'H]rcemt'.nt (or scor- 
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ing was assessed at 100% after each examiner 
scored 25% of the other examiners protocols. 

RESULTS J 
Results of Mest applied to percent of key 
words read incorrectly indicated that the groups 
did not differ significantly with respect to reading 
difficulty, 1(29) = .37, p > .20. Overall, LD 
J students misread 6.6% of 30 total key words, 
' 1 whereas noh-LD students misread 6.75% of 29 
key words. 

Proportion correct by collapsed strategy group 
(inappropriate = strategies 1-3; referring = 
^strategies 4-5; reasoning = strategies 1-3; referr- 
ing = strategies 4-5; reasoning - strategies 6-7 
was computed for item type and student group 
(see Figures 1 and 2). 

Strategy data were scored for appropriateness 
of reported strategy. Strategies were considered 
appropriate if students reported referring to the 
passage on a recall question (strategy 4*br 5) , or 
if they reported a reasoning strategy in response 
to an inferential question (strategy 6 or 7). Pro- 
portion of appropriate responses was then 
entered into a 2 group (LD vs. non-LD) by 2 
item type (direct recall or inferential) analysis of 
variance (ANOVA) with repeated measures on 
the item-type variable. Because of the unequal 
group frequencies, a least-squares method of 
analysis (Winer, 1971) was employed. Signifi- 
cant differences were found for item type, F(,29) 
. = 9.19, p < .01, and interaction, F(l,27) = 
7.58, p < .05. Figure 3 depicts graphically the 
interaction effect. Although both LD and non- 
LD students reported a high proportion of referr- 
ing to text strategies on recall questions (89% vs. 
77%, respectively) . Nonsignificant differences 
were observed for overall group means, F(l,29) 

a 1.54. 

Analysis of confidence reports revealed that 
both groups were similar with respect to reported 
confidence level on referring to passage 
strategies with LD students reporting confidence 
in 85% of the casts and non-LD students repor- 
ting confidence in 92% of the instances. Ttjese 
reports were similar to actual performance, with 
correct scores of 81% and 86% on these items 
for LD and non-LD groups, respectively. On 
reasoning strategies, however, a different picture 
emerged. Here regular-class students were cor- 
rect on 83% of the inferential items, compared 

to an average reported confidence of 71% of the 
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items. The LD students, on the other hand, 
reported being confident an average of 95% of 
the cases, while being correct in only 63% of 
these cases. 

- horns on the letter-sound subtest were scored 
for responses which suggested attention to an in- 
appropriate distractor. This inappropriate 
distractor took the form of an initial consonant 
blend present in the stem, but not underlined. A 
comparison of the number of inappropriate 
distractors by a group revealed significant dif- 
ferences. r(2S) = 2.47, p < .05. Thus, LD 
students chose the inappropriate distractor in 
52% of the cases, compared to the non^LD 
Vtildren who selected the inappropriate dlstrac- 
tol\in only 24°© of the cases. 

DISCUSSION 
The present sample of LD third graders, with 
reading ability controlled for, differed from their 
regular-class counterparts with respect to (a) pro- * 
portion of appropriate reasoning strategies 
reported for inferential comprehension ques- 
tions, (b) performance and confidence level for 
iiems in which reasoning strategies^jhad been 
reported, and (c) choice of an inappropriate 
distractor on a letter-sound test. However, LD 
students did not differ from their nondisabled 
peers in terms of appropriate strategy use on • 
recall items. Generally, this sample of LD 
children was seen (a) to report f^Wer reasoning 
strategies, when appropriate, on reading 
comprehension-test items that their regular-class 
counterparts, and (b) to be less successful on 
* those items for which they reported using 
reasoning strategies. These results support those 
reported by Scruggs and Lifson (1984) who 
found that LD students exhibited relatively in- 
fonor performance on a test of selected reading- 
comprehension test items for which the relevant 
passages had been removed, and for which 
reasoning strategies were thought Jo be 
reressary in order (o answer the items correctly. 
The present finding of inappropriately high con- 
irdence levels exhibited by the LD students on 
items for which reaosning strategies had been 
applied supports a theory of a developmental 
deficit in meta-cognitive abilities (e.f., Torgesen, 
1977). as inappropriately high confidence levels 
in task performance are often seen in tyoungo/ 
ch;!drup Such a deficit on the part oMTD 
children is thought to be critical, since ability to 



evaluate accurately a chosen response is a . 
necessary prerequisite for effective test-taking. 

LD students, tendency to attend to an inap- 
propriate distractor may be a function of an at- 
tentional deficit (Krupski, 1980) on test format as 
much as a deficit in phonetic skills. It is unclear 
whether these test-taking skills are subject to 
remediation (Taylor & Scruggs, 1983), 
regardless, they may reflect a source of measure- 
ment error (Millman et al., 1965), 
Reading comprehension seems to • resist 

* precise analysis and to be the subject of many 
theoretical orientations exist (Spiro, Bruce, & 
Brewer, 1980). If recall and inference are looked 
upon as two component parts of reading com- 
prehension, however, results of the present in- 
vestigation suggest that LD children demonstrate 

*~ "^strategy and performance deficits on inference 
questions, but not on recall questions, with 
reading ability controlled for. Thus, it may be 
argued that the specific deficits exhibited here 
reflect problems in reading comprehension 
rather than test-taking skills. It seems likely, 
therefore, that strategy training in such areas 
could leak to improved reading comprehension 
as well as improved test-taking skills, particularly 
since selecting and implementing appropriate 
strategies has been found to'^irtiprove general 
cognitive functioning (e.g., T<jrges 4 en & Kail, 
1980). In the word-study skills subtest, however, 
the LD students apparently became confused by 
specific format demaftds which likely had little to 
do with the content being tested, (Le, f matching 
on initial consonant blend rather' than an 
underlined vowel sound). Training for this type 
of strategy deficit, therefore, cannot be expected 
to bring about a concomitant Increase in 
phonetic analysis skills. 

Replication is necessary to further support and 
refine these finding. The present results sug- 
gested that LD children may benefit from specific 
training in (a) attending to specific format de- 
mands, (b) identifying inference questions, and 
(c) selecting and applying appropriate strategies 
relevant to such questions. 
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ATTITUDES OF BEHAVIORALLY DISORDERED 
STUDENTS TOWARD TESTS 1 

THOMAS E SCRUGGS, MARGO A. MASTROPIERI, 
DEBRA TOLFA AND VESNA JENKINS 
Utah Stat* University < 

Summary. — In two studies, attitudes reported toward testing by behavior- 
ally disordered students and their regular, classroom counterparts were com- 
pared. In Study 1, 12 behaviorallr disordered 'and 25 average fifth and sixth 
graders were given a survey regarding their attitude toward tests and the test- 
„ n . taking experience. Students classified as behavioraliy disordered reported less 
positive attitudes toward tests tthan their more average peers; these attitude 
differences were more Renounced on items which reflected subjective attitudes 
toward the test-taking situation and aspirations about performance and less 
pronounced on evaluation 0 f the value of tests. In Study 2, which employed a 
sample of 2$ behavioraliy disordered and 25 regular classroom students matched 
on age and sex and used a longer attitude measure* differences 'were not found. 
TaJtfn together, these studies suggest that attitudes toward tests are inconsistent 
In the two populations and ttfar some behavioraliy disordered students may not 
differ so much in this regard as supposed. 

Students classified as having behaviofal disorders have often been said to 
exhibit deficiencies in academic performance as measured by standardized 
achievement tests (Motto & Wilkins, XS>68; Stone & Rowley, 1964). Kauff- 
man (1931) has reviewed several studies which examined the academic achieve- 
ment characteristics of behavioraliy disordered students and concluded that 
often the performance of these students f|U§ far^low their potential. Bases 
of these academic deficits are not completely understood, but it is commonly 
thought that behavioral disorders exhibited by* this population have a negative 
effect on academic achievement, It is possibie^owever, that other factors also 
play a role in the generally lower functioning bf behavioraliy disordered stu- 
dents. One of these factors may be a possible difference in attitude toward 
the evaluation process, particularly as evidenced by achievement tests. Since 
no data document possible differences in attitudes toward tests arid the test- 
taking situation, the present pilot investigation was intended to provide infor- 
mation on whether behavioraliy disordered students may differ from their more 
average peers with respect to attitudes with which they approach the test-taking 
situation. Results of such an investigation would not be expected to indicate 
causal relations between attitudes and test performance but might be of value 
co researchers interested in differences in characteristic performance on achieve- 
ment tests between- behavioraliy disordered and more average students. 

^e research^ described here was supported in part by a grant from the Department of 
Education<* f ^pwial Education Programs, No. G00&300008. The authors thank Ms. 
Cathy Smith, Coordinator of Special Education, Hillview Elementary School, Salt Lake 
City, Utah, for her assistance with this project. Address requests for reprints to Thomas 
E. Scruggs, Ph.DJ, UMC 68, Utah State University, Logan, Utah 84322. 
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Study 1 

Method 

Subjects were 37 fifth and sixth grade students attending a' public school in a 
western metropolitan community. Twelve of these students had been classified as be- 
haviorally disordered, and 25 were more typical fifth and sixth gradett~attending regular 
classes in the same school. The principal criteria for identification as behaviorally dis- 
ordered were average ability coupled with social or emotional functioning substantially 
different from that ordinarily shown by some other students and supported by teachers' 
and psychologists' observations and reports. Identification as 1 behaviorally disordered 
occurred after less intensive educational and psychological interventions had not reme- 
diated the observed deficiencies. All 12 behaviorally disordered students were attending 
a self-contained class in the same school as the more, average fifth and sixth graders. 
The two groups were evenly distributed with respect to grade; the sample of more 
average students contained 12 fifth and 13 sixth graders, while die behaviorally disordered 
sample contained & fifth and 6 sixth graders. 

The 12-itera Test Attitude Survey was constructed as part of a larger investigation 
involving the test-taking skills of learning disabled and behaviorally disordered students 
(Taylor 6c Scruggs, 1983) and contained such items as "taking a test bothers me," 
"it i3 important for me to do well on a test," and "tests are unfair." "Yes" or "no" 
responses indicating aggreement or disagreement with the associated statement were 
solicited for each statement. Internal consistency of this survey had been reported as 
.78 ( Kuder-Richardson 20) on a previous administration to regular class elementary 
school students, indicating a moderate level of reliability for a survey of this nature. 
Students were given' the survey during regular classes and wrote an answer to each 
question as the teacher read each item aloud. Students were gi\ ;n 1 point for a positive 
response (i.e., "yes" to a positive statement, or "no" to a negative statement) and 0 
points for a negative response. Tests were scored by independent scorers unaware of 
group membership. 

Results - w 

The reliability of the survey for the present sample was .76 (KR-20), 
which was consistent with previous reports. Comparison of total scores for the 
two groups indicated that the average group of students had scored more posi- 
tively than the behaviorally disordered group. The regular fifth and sixth 
graders reported 63% pbsitjye responses (M =; 7.6, SD = 1.8), while the 
behaviorally disordered students reported 47% positive responses (Af = 5.6, 
SD = 2.4), a statistically significant difference (*35 = 2.80, p < .01). 

In a supplementary analysis, factor analysis of responses for the group as 
a whole yielded three factors with eigenvalues greater than 1.00, which ac- 
counted for 67.5% of total test variance. A principal components analysis, 
using Kaisers criterion for factor limitation, l's in the diagonal, and varimax 
rotation (SPSS, 1983) yielded factors of personal feelings about tests (e.g., 
"taking a test makes me upset"), personal importance of tests (e.g., "it is im- 
portant for me to do well on a test"), and evaluation of the worth of tests 
(e.g., "tests are unfair"). Items which loaded most highly on each factor were 
compared between the two groups by means of t tests. The two groups again 
differed on the first factor, subjective feelings about tests (*35 ==? 2.34, p < 
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.025), and Factor 2, subjective importance of tests (/ 3 5 = 2.46, p < .02); 
the two groups did not differ with respect to the third factor, evaluation of the 
value of tests (A™ = .84, p > .05). 

Discussion 

Present results suggest that this sample of behavioraily disordered children 
differed from their peers in attitudes expressed toward tests and the test-taking 
situation. Although the two groups did not appear to be different with respect 
to evaluation of the role of tests, they did differ in their personal feelings about 
tests. These findings seem to suggest that, although the present sample of 
behavioraily disordered students appeared to appreciate the worth or impor- 
tance of tests, they reported much less positive personal feelings about tests. 

Several issues, however, can be raised which preclude drawing conclusions 
from the present findings. First, the sample of behavioraily disordered students 
is of insufficient size xp permit generalizations to a larger population or further 
subdivision, e.g., by sex. Second, the attitude measure had too few items to 
draw firm conclusions regarding subtest performance. Study 2* then, was con- 
ducted to (a) confirm 1 the present findings on a larger sample of behavioraily 
disordered students and (b) expand the attitude survey to contain more sub- 
test items. 

Study 2 

Method 

Subjects were 75 regular classroom students representing Grades 3 to 6 in a western 
metropolitan public school, and 25 students attending self-contained classes for students 
with' behavioral disorders, Grades 3 to. 6, in the same school. A different test attitude 
survey was constructed to include two subtests of items suggested by the factor analysis 
of Study 1: (a) items which reflected feeling about self in a testing situation (e.g., M I 
feel good when I take a test") and (b) items which reflected feelings^ about the value 
of tests, themselves (e.g., "Tests help the teacher to see what we know"). This instru- 
ment had been piloted on a different sample of 55 elementary school students. Assess- 
ment of reliability gave a KR-20 of .74 for 22 items, and two subtests a and b, above 
correlated weakly with each other (.11). This low correlation suggested that separate 
aspects of testing attitudes were being assessed. 

The 22-item measure was then administered to the sample of behavioraily disordered 
students and their peers in the students' regular classrooms. Items were read to the 
students by their teacher). 
Results 

Reliability (KR-20) of the attitude measure v/as .75- Reliability of the 
subtest of "personal feelings" items was .64, while reliability of the "value of 
tefets" subtest was .59. Because the two groups differed in distribution of age 
and sex, 25 subjects were drawn from the peer group which were matched with 
the behavioraily disordered students on tnese variables. The resulting samples 
were virtually equivalent with respect to age (126.0 mo. vs 125.9 mo. for 
behavioraily disordered and regular class, respectively) and sex distribution 
(21 members of each group were boys). 
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Analysis of attitude responses indicated that groups did not differ with 
respect to total score, score on "personal feeling" items, or score on 'Value of 
tests" items (|*| < 1.00 in all cases). On total items, scores for behaviorally 
disordered and regular classroom students were, respectively, 16.5 (SD sx 4.4), 
and 15.9 (SD = 3.1) out of a possible 22 positive responses. For "personal 
feelings" items, scores were, in the same order, 10.3 (SD = 2.9) and 9.8 (SD 
= 1.9) out of a possible 13 positive responses. For "value of tests" items, 
scores were 6.2 (SD = 2.0) and 6.1 (SD = 1.6) ou? of 9 possible positive 
responses. Although a further breakdown by sex might have been interesting, 
the snisil number of girls in each group would not permit this. 

General" Discussion 

In Study 1, a small sample of behaviorally disordered students reported 
less positive attitudes toward tests than did their regular class peers. These 
differences appeared to reflect differences in personal, feelings regarding the 
testing situation rather than attitudes concerning the utility and value of tests 
in general, although the number of items was too small for conclusions to be 
drawn. In Study 2, a larger sample of behaviorally disordered and regular stu- 
dents matched on sex and age did not differ with respect to reported personal 
feelings about tests, attitudes concerning .the value of tests, or total attitude. 
Although subjects reflected several' different grade levels, attitudes by grade 
level could not be assessed due to the potential confounding of grade level by, 
classroom. ' . / 

One possible reason for the discrepancy between Studies? 1 and 2 is that 
the subjects in Study 1 were not for one reason or another, representative of a 
larger population of behaviorally disordered students. Another possibility, and 
one worthy of further investigation, is that the discrepant findings reflect the 
fact that Study 2 was conducted during the beginning of the school year, when 
attitudes are commonly thought to be higher, while Study 1 was conducted at 
the end of the previous year after students had recently experienced testing. 
Further research is necessary to assess this hypothesis. At present, however, it 
may be concluded that some behaviorally disordered children might not differ 
so much from those in regular classrooms with respect to attitudes toward 
testing as might be thought. 
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Abstrtci^ 

" \ 

It has been seen that children's scores on reading, achievement 
tests vary not only with knowledge of content but also with the 
differing formats of test items. Teachers working with learning 
disabled children or children with attention problems may wish to 
choose standardized tests with fewer rather than more format 
changes. The present study evaluated the number of format and 

direction changes, across tests and grade levels of the major 

S ■ 

elementary standardized reading achievement tests. The number of 
format changes varies from one change every 1.2 minutes on the 
Metropolitan Achievement Test Level El to one change every 21.3 
minutes on the PI level of the Stanford Achievement Test. 
Teachers may wish to take this evaluation into account when 
considering use of standardized reading achievement tests for 
their students. 
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Format Changes in Reading Achievement Tests: 
Implications for Learning Disabled Students 

The validity of group administered achievement tests for 
learning disabled and remedial reading students has been 
questioned (Benson & Crocker, 1979). A score on a science test, 
for example, should reflect the student's knowledge of the content 
area and not be dependent on reading ability. It is important, 
therefore, for the test maker to recognize bias related to such 
reading material and to remove that bias (Benson & Crocker, 1979). 
Another potential source of bias has been identified as test 
formats knd format changes (Carcelli & White, 1981). In one study 
of reading achievement, children's responses to test items of the 
same content, presented in different formats, varied from 45% to 
92% correct (White, Carcelli, & Taylor, 1981). Although 
standardization procedures can compensate in part for the 
influence of test formats, it is important -that a*student's score 
reflect, as accurately as possible, his/her knowledge of the 
content being tested. 

Children in grades lower than the fourth have attained 
significantly lower test scores when the major format change of j 
using a separate answer sheet is introduced (Cashen & Ramseyer, 
1969; Harcourt, Brace, Jovanavich, 1973; and Ramseyer & Cashen, 
1971). The skill of completing the separate answer sheet appears 
to be developmental in nature. While first and second graders do 
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not spontaneously or after training use separate answer sheets 
efficiently (Ramseyer & Cashen, 1971), third graders have been 
successfully trained in the use of separate answer sheets (McKee, 
1967). 

Learning disabled children, children with attention 
problems, and children functioning below gra<!te level may be even 
more adversely affected by format changes. Scruggs, Bennion, and 
Lif sonl (i n jpress) in a study conducted with third grade learning 
di sabled- students , demonstrated that LD students were more easily 
confused and distracted by novel formats. These novel formats 
include the use of separate answer sheets* Most standardized 
tests begin use of separate answer sheets in fourth grade; the 
fifth grade LD student, functioning two years behind, may also 
experience difficulty with this task (Scruggs & Tolfa,*1985) . 



Scruggs and Tolfa (1985) have demonstrated Jhat fourth grade LD 
students do perform less accurately and. with less speed on 



separate answer sheets than do their normally functioning peers. 



Given the extent to which different formats inhibit correct 
responding, and the lesser ability of children at earlier 
developmental stages as well as the learning disabled student and 
poor reader to adjust to major format changes, teachers of such 
students may wish to consider using reading achievement tests with 
less frequent (rathe-* than more frequent) format changes. 
Teachers will prefer to use tests on which a student's scores are 
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affected more by knowledge of content, than the ability to adjust 
quickly to format changes. 

Teachers, however, do not often have the opportunity to alter 
district decisions on which standardized tests are administered. 
In such situations, training may be beneficial. Scruggs and 
Mastropieri (in pr^$s) demonstrated that BD and LD students could 
be successfully trained in test taking skills involved wtth format 
changes. Scruggs and Mastropieri (in press) found that the more 
complicated the formats, the greater wsre the training gains. 
Since format has been shown to be a variable influencing test 
performance, the present investigation intended to compare the 
number of format changes, across grade levels, of the major 
standardized reading achievement tests. Levels from kindergarten 
to seventh grade were included. 

Procedure 

Reading subtests of the following standardized tests were 
analyzed for format changes: the Stanford Achievement Test (SAT) 
levels Primary 1, Primary 2, Primary 3, Intermedi ate^irr~~^ 
Intermediate 2; the California Achievement Tests (CAT) levels 10- 
17; the Metropolitan Achievement Tests (MAT) levels Primary 1, 
Primary 2, and Elementary and Intermediate; the Iowa Tests of 
Basic Skills (ITBS) levels 7-13; the Comprehensive Tes^ts of Basic 
Skills (CTBS) levels A-6; and the SRA Achievement Series levels 
A-F. 
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A fVrmat change was defined as a variation in the number of 
options per item, a change from column to row or row to column, a 
change in either *-fe«ror options fnom word to picture to passage 
to question to cloze item. Comparisons across tests and grade 
levels were made by dividing the time allowed by the number of 
formats in the test. For example, 20 minute's/4 formats means 
that in this case, there is a format change every 5 minutes. 
Interrater agreement was established at 100% by two raters 
discussing and re'coding any independent disagreements in coding. 

Results 

Format .information specific to each individual test is 
presented in Table 1. The standardized test with the least number 
of formats 1s the Mep^opolitan Achievement Test, which has an 
average of 3 formats across levels. The standardized test with 
the least number of ■ format changes is the SRA, which has an 
average of 6 format changes. The SRA levels have one change every 
13-16 minutes. The test with- the greatest number of formats is 
the California Achievement Test and the Iowa Test of Basic Skills, 
both of which have an average of 8 formats. The standardized test 
with the greatest number of format changes is the Stanford 
Achievement Test. The SAT has an average of 18 format changes 
with level 12 showing 32 format changes, or a change every 2.6 
minutes. 
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Insert Table 1 about heVe 

• * ~> 

i 

The mean of the format changes across qnAv levels varies 
from one change every 6.1 minutes at grades 2-3 to one change 
every 12.75 minutes at graded K. 

Discussi on 

Children's test scores vary not only with knowledge of 
content, but also with the differing fonnats' of test- items . 
Teachers of children with learning or attentional difficulties may 
wish to consider various options to help ensure all possible bias 
is eliminated from standardized tests. Teachers and school s 
districts should consider using standardized tests with the' lower 
numbers of format changes. When it is not possible to^etrao^ 
tests administered, the teacher should provide praC*4xe and 
training with difficult formats. In addition, if a teacher ^ 
suspects that students have difficulty adjusting to new formats, 
she or he may prefer tb use a test which allows a reasonable 
amount of time before. swi tching to a different format. The number 
of format changes on the major standardized reading ich'Wement 
tests varies from 1^ change every 1.2 minutes on the Metropolitan 
Achievement Test to 1. change every 21.3 minutes on the Stanford 
Achievement Test. 

Although the teacher should always exhibit caution when 
interpreting test results, extra care can be taken when problems 

with format changes are suspected. 
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Level 


Grade 


Minutes 


Format 


Format Change 


Changes 


Change 


CAT 


10 


K.0-K.9 


116 


7 


16.7 •* 


7 


16.6 




11 


K.6-1.9 


57 


8 


5.2 


11 


7.1 




12 


1.6-2.9 


59 


9 


5.8 


12 


7.7 




13 


2.6-3.9 


69 


11 


2.8 


24 


6.3 




14-19 


3.6-7.9 


45 


5 


7.5 


6 


9 


M ean/ 








/8 


77.6 


/12 


/9.3 


CT3S 


A 


K.0-K.9 


53 


5 


8.8 


6 


10.6 




8 


K.6-1.6 


45 


5 


5.6 


8 


9 




C 


1.0-1.9 


65 


6 


7.2 


9 


10.8 




D 


1.6-2.9 


64 


8 


7.1 


9 


8 




E 


2.6-3.9 


70 


8 


7.8 


9 


8.8 




F 


3.6-4.9 


59 


9 


6.3 


11 


7.7 




G 


4.6-6.9 


50 


9 


5.5 


11 


6.7 










/7 


/6.9 


/9 


/8.8 


IT2S 


7 


1.7-2.6 


68 


10 


3.8 


18 


6.8 




8 


2.7-3.5 


68 


12 


2.3 


40 


5.7 




9-14 


3-7 


57 


3 


14.3 


4 


19 


M san/ 








/8 


/6.8 


/17 


/10.5 


MAT 


PI 


1.5-2.4 


45 


3 


15.0 


3 


15 




?2 


2.5-3.4 


40 


2 


3.3 


12 


20 




a 


3.5-4.9 


40 


3 


1.2 


33 


13.3 




Int 


5.0-6.9 


' 40 


3 


2.4 


17 


13.3 


''*an/ 








/3 


/5.5 


/16 


/15.4 


SAT 


PI 


1.5-2.9 


85 


4 


21.3 


4 


21.3 




P2 


2.5-3.9 


90 


8 


6.0 


15 


11.25 




P3 


3.5-4.9 


80 


9 


6.7 


12 


8.9 




11 


4.5-5.9 


85 


8 


3.1 


27 


10.6 




12 


5.5-7.9 


85 


8 


2.6 


32 


10.6 


Vein/ 








n 


/8 


/1 8 


/12.5 




A 


K.6-1.5 


97 


6 


13.9 


7 


16.2 




B 


1.6-2.5 


115 


7 


16.4 


7 


16.4 




C 


2.6-3.5 


85 


6 


14.2 i 


6 


14.2 




? 


3.5-4.5 


48 * 


3 


16.0 


3 


16 






d. 6-5. 5 


50 


4 


12.5 


4 


12.5 




. F 


6.6-8.1 


50 


4 


12.5 


4 


12.5 










/5 


/14.3 


/5.2 


/14.6 
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Abstract 

Kes u ,ts of 2 4 studies which investigated the effects of training 
elementary schoo, children in test-taking s ki „s on standardized 
achievement tests were analyzed using meta-analysis techniques. 
In contrast to all previous reviewers, the results of this ' 
analysis suggest that training in test-taking skj „s has on,, a' 
very small effect on students' scores on standardized achievement 
tests. Longer training programs' are TO re effective, particular/ 
for students in grades 1-3, and^for students from low ' / 
socioeconomic status background. Results from previous reviews of 
«». body of literature are critiqued and explanations^ fered as 
to why the results of the present investigation are, Somewhat 
contradictory to.previous reviewers' conclusions. Suggestions for 
further research are given. 



\ 



" 4 Improving Achievement Scores- 

• • 1 " ] " ■ 3 

Teaching Test-Taking Skills to Elementary 
*. ' Grade Students: A Meta-Analysis 

Since the seminal work of Millman, Bishop, and Ebel UL965), 
much attention has been directed to the influence of test-takina 
\ skills, or "test-wiseness," on scores c ?of achievement 6ests\ 
^Assumptions from the past have included that test-wisene&s- is a 

substantially separate variable no£ strongly correlated 

1 ■ ^ 

•intelligence (Diamond & Evans, 1972), that test-taking -skills are 

1 * 
alterable by training, and that these skills would transfer to 

higher scores on achievement tests (Ford, 1973; Fueyo, 1977; 

Sarnacki ,. 1979). v. 4 j " s 

Train-ing materials have been created (some of which are 
comrnercially available) to teach "test-taking skills*" (e.g., Mini- • 
" Tests, 1979 and Test-Taking Skills Kit', 19£0), and claims have 
been made that such training leads to increased test scores (e.g.* 
Fueyo, 1977; Jones & Ligon,' 1981; Samson/fp04). The rationale ,.' ■ 
for such training programs sterns from the common practice of 
utilizing results from achievement tests to assist in makinq 
decisions about educational placement, programming, and 
evaluation. To the degree that acljieiement tes^s are measuring 
•test-taking skills rather than mastery^ the/content being tested o 
(e..g., reading, math), decisions about placement, programming, and 

• evaluation may be incorrect (see Ebel, 1965, for additional 
discussion). Promoters of. teaching test-taking ski lis have 



9 1 

ERIC 97 



hruifliiiffnriTiaaii 



V 



, Improving Achievement Scares 

' *■.•••• * V ' 4 

claimed that students would ^obtain higher scores if deficiencies 
in test-taking skills were remediated, thus resulting- in 'a more" 
valid indicator of /how well the student had mastered the content 
the test was designed to assess. 

Al thou gh % ef forts- to reduce measurement error in "standardized 
achievement testing 'are commendable, several questions remain: 

1. Although many people have concluded that test-taking 
skills training Teads to increased test scores;, is that^ posi tion 

consistently supported empirically, and what is tfye magnitude of 

typica\ly obtained effects? 

S > ■ . 

2. Can the cost of typical test-taking training programs be 

justified in vl^jw cf the magnitude of observed effects and the 
alternative uses of the same resource (i.e., is ih cost- 
effective?)? < * 

3. Are some types of training more/ valuable than others in 

j 

i 

increasing performance on achievement tests, and are some groups 
*of children- more likely than others to benefft from^such 
training? The pur^se^of the present investigation wSs to 
integrate th^ results from ^previous research to answer the 
preceding Questions as tjiey pertain to standardized achievement 
tests with 'elementary schoo?-ag€d children. 
Review of Previous Work 

Several reviewers have previously examined the effects* of 
teaching test-taking skills (Bangert-Drowns , Kulik & Kulik, 1983; 
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Ford, 1973; Fueyo, 1977; Jones i.Ligon, 1981; Samackl , 1979; 

Taylor, 1981). A summary of the characteristics and conclusions 

of these reviewers is shown in Table 1. ;M \ 

1 ■ \ W 

Insert Table T about here 

All previous reviewers concluded that test-taking ski 1 ls\could be 
taught effectively and resulted in benefits for children 
(including'.iigher achievement test^cores). Unfortunately', excep^ 
for .Bangert-Drowns et al/'(1983) and Taylor (1981), previous 
reviews failed to indicate the procedures or criteria' for 
including research studie^wa their review, did not cite and 
critique prior reviews, and apparently only analyzed. results of 
the primary research included in their review in' terms of the 
original researcher's conclusions. As will be shown below, all of 
the reviewers failed to incluae a substantial number of studies 
with elementary aged children. Consequently, one cannot be k 
confident that results cited in these reviews are representative 
of available reseach. It is also' difficult <to draw conclusions 
about the magnitude of the alleged effect of training students in 
tes^taking.skills since most , of the reviewers ^stated only that 
differences were found, or improvement was noted, and occasionally 
referred to statistically significant^ fferences between "groups. 
•Without knowing more about the magnitude of the effect 
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attributable' to teaching, tes,t-taking skills, it-is difficult to' 
draw conclusions about whether it is v likely taf be a wise 
investment to diveft resburces^from other activities (e r .g., 
teaching reading.) to teaCh test-taking skills, 

Taylor (1981) conducted an excellent review on the effects of 
practice, coaching, and reinforcement on t$st scores. This 
investigation focused upon all age levels and on group-* 
administered as well as individual ly "administered tests. The 
great majority, of Studies selected, in fact, concentrated on 
either IQ 'tests or norf-elementary age populations; consequently, a 
substantial number of studies which investigated the effects of 
training achievement test-laking skills with elementary-aged 
children were not included in her review. 

The most comprehensive analysis to date of the effect of 

* ... ■• 

teaching test-taking skills on achievement test scores was a meta- 
analysis recently completed by &angert-Orow)fs et al. (1983)^ % 

The" effect of teaching test-taking skills for elementary-and 

* *t 

^econdary-aged children was analyzed by computing a standardized 

mean di f ference*eff ect size for eagh study (Glass, 1977) to 

indicate the extent to which achievement teSt scores were altered 

( * 

by training. v Thjs was a substantial improvement from most earlier 

V Y * > 

reviews.which rel ied 'primari ly on authors: 1 conclusions ^or tests of 

• statistical^signif ican^e wi thout indicating the magnitude of 

- > \ ,\ 

effects. Knowing the magnitude of. improvement is very important 

•■. v ■ "•' / ' • 
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so that practitioners can make 'judgments concerning whether the 
'investment ifl traihing is co$£-eff ective compared to whatrelsa 

QOU^d.have been accomplished witjti- that time. Bangert-Drowns et 

« • *S *' . . " 

al. (1983)* concluded that teaching test-taking skills raised 
standardized achievement test scores by .25 standard deviations-- 
enough to raise^the typicaTYtw/e^ from the 50th to t^e 60th 
percentile. They also concluded that .length of training program 
was positively related to effect , size*; drill and practice was 1«ess 
effective than training in "broad cognitive skills; 11 and 
effectiveness of trainjng was not affecfed<by Identifiable subject 
characteristics oV othef characteristics of the program. 

Although B^ngert-Drawns et §1. provided^valuable information, 
their study, is limited by several factors.* First, a number of 
studies have bedn done which were not "included in their review. 
Secondly, although indicators of study quality were coded, there 
was no report of efforts to determine if tfrere were differential 
effects for stud^s of high versus low quality,. U*may be, for 
example, that investigations of lower quality produce effect sizes 
which'are substantially different (and als v o less credible) than 
studies of high quality, * 
- Third, their decision to average all outcomes from a'given* 
study into one measur^Kof effect size can be misleading. For. 
example^evine (1980) randomly assigned row SES and not low SES 
fifth graders to either test-taking training or control groups-,and 
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collected data'on 'students 1 scores on .standardized reading 

achievement and' an assessment of "test-wiseness'V^Four obvious * 

effect sizes- are possible: low *SES experimental versus contml 

for reading v and v test-wiseness; anji nbjc low'SES experimental versus 

control for reading |v and test-wiseness. These four effect sizes * 

range from .38 to 1.52 (and_ average .90. To report -only the * 

average of all^fdtir is not on : ly misleading, but irretrievably 

- * * > v * 

obscures irfiportant'diff erences. between types of subjects* ,and types 
of outcome (e.g.,. in this s/tudy the effects for low SES subjects 
were much larger than ."not -low SES" -subjects for both outcomes, 
and effects for test-wiseness were much larger thah^reading^ 
achievement for both groups). "' ^ « 

Finally, in some instances Bangert-Drowns et al . 'appear to 

? 

haye used inappropriate computations for determining the effect 

* I v 

size^ For example, in the Romberg (1978) study, classroom were 
randomly assigned to treatnten'ts , and class" averages were used as 
the unit -of analysis. Whvle the use 6f classroom means as the_ 
unit of analysis isran approprjate statistical procedure (Peckham, 
Glass, & Hopkins, 1969), the standard deviation of 'group means 
will generally be, much smaller than the w^i thin-group standard 
deviation. The use of . the between-class standard deviation wftl'l 
resulfr-4-n- a much larger effect size and will not be comparaW*t 'to. 
studies for which the within-group standard deviation was used. 
In the Romberg study, Bangert-Drowns et £l. apparently used the 
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between-class standard deviation for Achievement test scores-and 
.pbt^ned an effect size of .48. By contrast, theVesent authors 
>' estimated tHe effect size (since .wi thin-group- standard deyiation S< 
' were not.Yeported)- by converting the reported, percentile scores to 
<* Z scores and using differences in Z scores as the effect size. 
• ' This procedure yielded, an/effect 'size.based on the within-group 
standard- deviation of only .H--less- than erne third the magnitude ■ 
' of Bangert-Drowns et al. estimate. . •* % 

Other important questions remain unaddressed by Bangert- • 
; .^ro-whj-et al. (198?). First, many investigations bel ieye- thafthe 
training of test-taking skills is particularly beneficial for ^ 
children in low socioeconomic 'settings (e.g., Jones & Ligon, 1981;. 
• .'jongsma^ Warshauer, 1975). Thus,' it is important to determine 
. whether teaching test-taking skills has a differential efjfct.on 
-children of low socioeconomic status 'than it does on childr|n who 
do not come' from "such groups.- Secondly", it is important to 
determine whether the effects of training in -test-^king skills 
are different for children of different ages. In. the Bangert- 
Orowps et al. study, students in gr^s 1 to 6 were combined into 
o*e category. Third, it is important to^eplifate their findings 
S about length of training and r type 'of draining, and to determine 
f whether there are any other important concomitant variables or 
'interactions among variables not identified by Bangert-Drowns et 
' ' al. Finally, it 'is important to know whet he? studies of adequate 
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validity produce diffbrent -effect sizes from studies of less than 

adequate validity, and whether there is a differential effect for 

' • * ' ' • ' * j 
different -types of dependent measures (e.g.; achievement tests, 

* measures of test-wiseness, student attitude). 

'Procedure ' ,' ' 

^ Location of studies . Several procedures were used to find as 
many studies as possible which investigated the effect on group- 
adminstered standardized achievement test scores of teaching test- 
taking skills to elementary-aged school children. Studies which 
examined attempts^ improve, for example, scores on 
•individualized \chievement tests or-IQ tests were excluded from 
this analysis. Also excluded from analysis were studi es which 
investigated the effects of training on 'achievement test' 
performance of students^_of greater than 6th grade level. 

Studies were located by first conducting a corriputer-assisted 
search of Pi ssertation* Abstracts International . Psychological 
Abstracts , and Educational Resources .Information Center (ERIC) 
da^a bases. Studies found in this way were examined to determine 
whether they contained references to other appropriate studies. 

' ' 4*' 

Previous reviews of research on teaching test-takinq' ski 1 Is 

^ 

(Bangert-Drowns et al., 1983; Ford, 1973; Fueyo, 1977; Jones ,& 
Ligon, 1981,; Sarnacki , 1979; Taylor, 1981) were- 'also examined '/for 
additional studies. Twenty-four experimental studies pf the '* ' 
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• effects of teaching test-taking skills on achievement tests for 
students in grades 1 through 6' were located. This number is 70% 
greater than the greatest number of studies involving achievement 
tests for elementary school children found by any previous 
Reviewer. * 

Coding. Each study was coded for 14 different variables 
. which described the type of subjects with whom the research was 
conducted, thevtype of training provided, JJhe experimental desiqn 
used, and the type of^outcpme data collected. The specific 
Variables coded are reported in Table 2' in the results section. 
Interra£er consistency was established by having two independent 
reviewers code each article. Wherever disagreement occurred, 
differences were resolved by discussion. 

To enable the comparison of all outcomes across all studies, 
an effect size for each Relevant comparison was computed (Glass," 
McGaw, & Smith, 1981). Effect size was defined as the mean \ 
difference between two'groups divided by '^he^standard deviation of 
the control group. When means apd standard deviations are not 
reported in a study, effect seizes can also be calculated from 
other statistics such as _t and F. Basic conventions for 
determining which effect sizes to code, and methods of calculation 
when means and standard deviations were not avai Table, 'are given 
in Casto v White, and Taylor (1983). 
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In addition , q obtained effect'sizes w„ere adjusted usoing 
fledges 1 (4981) formula for bias 'correction of the effect sTze 
estimator before analyses were .done. Although the correction 
procedure was used for all resi/lts in the present study, the 
•authors Igree with Bangert-Drowns et afl . ^hat the overall 
difference in effect sizes due to*this correction procedure was 
trivial (only 1 out of 65 effect sizes changed by\more than .01 of 
ah ^effect size). : 

1 Results and'Discussion 

4 

The 24 investigations of the effect of teaching test-taking 
skills resulted in 65 effect sizes which were relatively evenly 
distributed among studies. The mean effect s^ze for all 
comparisons including achievement tests, tests of test-wiseness, 
self-esteem, and anxiety, wa^ .21, a figuri which is consistent 
with that of Bangert-Drowns et al. but should be interpreted with 
caution since it -is the average across different types of 
dependent measures, studies of differing quality, and students 
with different characteristics. 

Table 2 shows the mean effect size for ^1 1 levels of the 
different* variables coded in the meta-analysis. As can be seen, 



Insert Table 2 about here 
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the average effect size for studies with adequate validity is 
relatively close to that of studies with inadequate. val tdi ty (.20 • 
vs. .29). Although this suggests that it may, not be necessary to 
\ ' ' ** , account for quality of study in interpreting the impact of 

* ' training students in test-taking skills, further examination of 
/ /fable 2 shows that this 'is not the case. In particular, we note 
that the average of 44 effect sizes for achievement test scores 
from studies of adequate validity is .10, while. the average of 6 
effect sizes from adequate .studies measuring "test-wi seness" is 
.71 — almost 10 times as large*- There are also no measures of 
test-wiseness ojr measures such as anxiety, self-esteem; and 
attitude towards the test, which come from studies with inadequate 
validity. Thus, the apparent -equivalence , in average effect sizes 
between studies of adequate validity and inadequate validity is 
largely attributable to the fact that outcomes other than 
achievement all come from studies of adequate validity and yield 
v substantially higher effect sizes than measures of achievement. 

The mean eftect size for achievement test scores from studies 
of adequate validity is only .10 compared to an average of .29 for 
achievement test scores, for studies with inadequate validity. 
This contrasts sharply with the findings of Bangert-Drowns et al. 
• who reported an average effect size of .25. Part of the reason 
that Bangert-Drowns et al. found a higher average effect size may 
have been th'at they collapsed several .different outcome measures 

1 ! 
i 

f < 
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from the study into one average effect sue. As noted. above, this 
can be misleading and prevents analyses of important issues. 

Because there is such rihramatic difference in average effect 
sizerbetween studies with adequate validity and studies with 
inadequate validity, and between measures of achievement and other 
measures, the remaining analyses will focus primari ly -on effect 
sizft of achievement tests from" studies with adequate-validity. 

The mean effect sizes f^or achievement test scorzz frdta 
studies with* adequate validity for different levels of length of 
trwtment, SES level, and grade level are shown in Table 3. 



.Insert Table 3 abqrut here 

•a* 

S 

As can he seen, there was considerable difference between 
interventions which were less than 4 hours and those which were 4 
or more hours (.04 vs. .29)., A similar finding was seen 'when 
results of^chievement test scores were broken down by grade 
level. When treatments were administered to students in the 
primary grades (1-3), the average effect -size on standardized 
achievement tests was only .01. 'From grades 4-6, however, the 
mean effect size for achievement tests was much higher, .20. The 
difference between students of differing socioeconomic background 
was very slight (ES = .14 vs. ES = .09), with a very smafl 
advantage for students from low socioeconomic backgrounds. 
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Even more interesting than the average effect size foT N 



different levels of these three variables are the interactions 
between the variables. 'As can be seen in Figure 1, for treatments 
involving Jess than 4 hours^ students in the primary grades 
exhibited slightly negative effect sizes (ES = -.12) while 
students from grades 4 through 6 had a^ average effect size of 
.19.^ For students receiving more than 4 hours of training, 
.however, there is no difference— stud^nts^ in both grad^^-3 and 
4-6 had an average effect size of .29. Although the mean effect ^ 
size for students in grade 1-3 with 4 or more hours of treatment 
is based on only four studies, these data are provocative and 
require further investigation. More specifically, it appears that 
for older students, a short amount of training in -test-taking 
skills may result in substantial improvement. However, for 
younger children, it takes much more training before therre are 
observable benefits. 

Figure 2 shows another interesting interaction between length 
of training and socioeconomic status. With less than 4 hours of 
treatment, neither "low SES ,f nor "not low SES" subjects benefited 
appreciably (average effect sizes are .05 and .08). Wi;th high 
levels of treatment, students from low socioeconomic backgrounds 
benefit more than frtfice as much as students, who are not from low 
socioeconomic backgrounds (average effect size = .44 vs. .2(f). 
Again, this finding requires further repl i cation4)efore confident 
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conclusions 'can be drawn, but it suggests that authors who have 
contended' that training in test-taking skills is most important 
for students from low socioeconomic background {e.g., Jones & 
^Ligon*, 1981; Jongsma & Warshauer, 1975) may be correct. 

Before drawing conclusions abfriit the efficacy of trairi-ing 
students in test-taking skills, it is important to comment- briefly 
'on the differences in average effect s^zes between outcomes of > 
Achievement test scores (ES = ilO), tests of test-wiseness 
(ES = .71), and measures of anx-iety, self-esteem, and attitude 
towards/tests (ES - .44). Admittedly, the' measures other than 
scores on achievement tests are based on a very limited number of. 
studies, so one should be cautious in drawing conclusions. 
However, from these data, jit appears that tests of test-wisepfess 
are more sensitive to training effects. One explanation fitor this 
much lVger average effect size is that the training program is 
"teaching to the test." The fact thaUhigh scores on tests of 
/test-wiseness are not^'necessari ly related to higher achievement 
test scores suggests' that the relation between test-wiseness end 
high scores 'on achievement tests is not very strong. It should be 
remembered that, the primary argument for providing training 'in 
test-taking skills to "students h^s always been related to the need 
to reduce measurement ^errors in the child's standardized test 
' score. To the degree that that is happening, it has been assumed 
that test scores would go up. Although the fact that tes-t scores 
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are not going up appreciably is not proof that scores* are not mo^e 
accurate, it still leaves' the burden of proof upon those who claim 
that trailing in test-taking skills is V beneficial. Higher scores, 
; on tests of test-wiseness are not sufficient evidence for those 
benefits. * y 



* Conclusions 
As noted earlier, this integrative review was designed to 



answer the following three questions^ ^ ^ 

1. ' To'what'degree .is the popular position that' training in 
test-taking skills is beneficial for chiVdren supported by 
empirical evidence? K 

2. Do the data about the effect of teaching test-taking 
skills ju^ify the use of resources for this purpose as opposed to 
alternative us$s of the same resource? . ^ ^ . 

3. Are S^n^types of training more effective or are some 
groups of children more likely to benefit from training in test- 
taking skills?, . 

In response to the first question, the results of this review 
stand out in contrast with all previous reviewers of the effects 

' - i * 

of training in test-taking skills. The most credible evidence 
(result^ from high quality -studies limited to^scores on 

standardized achievement tests), at least, as it-pertains to 

/■ 

elementary schoql-aged children* does not demonstrate a sizeable 
benefit for teaching test-taking skills. The reason for these 
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different conclusions is partly attributable to the use of more 

/ . • • « 

systematic techniques^ithan used by^many of the previous reviewers 

to' identify the magnitude of the effect ar^d how that effect 

covaried with, other. variables. .More importantly, a larger number 

/ . • . • 

of studies was identified and quality of study and type of outcome 

v 

was accounted for. 

"is training in test-taking skyl.ls cost effective? The answer 

is not c^ear-cut. Clearly, Venef its of a tenth of a standard 

deviation are^ relatively small (less than one month worth' of gain 

in reading for an average third grader), but* they were obtained at 

relatively little cost. Even the longest training'program lasted 

only 20 hours, and the majority of effect sizes came from .studies 
j • \ 

in which training lasted less than 4 hours. The- question also 

* 

depends in part* on whether one is* talking about children in grades 

1-3 or grades 4-6. These 'data suggeslNthat fo'r older-children, a 

limited amount of training c'an have a discernible effect. Fcr ^ 

f 

younger children, more training is necessary. Als"0', the 'fact that 
a few studies (unfortunately, it is a very, limited number) suggest 
that training in test-taking skills has some p&i Jive impact on v v 
anxiety, self-esteem, and attitude towards tests ; shou Id .not be 
forgotten. However, before it is accepted as f^t, morearese'arch 
needs to-be done. It is clear that a comprehensive analysis of 
previous research on training test-taking skjl/s suggests that the 
benefit's are not nearly so great as has typical-ly been concluded^ 
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* * 

> * 

Data from the meta-arralysis »do suggest that training in test- 
taking skills is differentially effective for various subgroups of 
children. The interactions between /length of treatment and grade 
level, and length ,of treatmerft and SES are- particularly 
prdyocative and deserve further research, «In general, the meta- 
analysis supports the conclusion of,Bangert-Drowns et all thf-at 



longer) training programs are more effective. As a general 

stra^egy^it also' appears that training is more effective in the f 

upper elementary grades than in the lower elementary grades. 

Whether or, not a'training package includes practice tests, 

reinforcement, or* drill and practice does not seem to be an issue 

r 

about which we have sufficient data to draw conclusions. More 
.•research needed before we can decide wh*at typtes of training are 
most effective. 

Should training in test-taking skills be. pursued? Hopefully, 
Ihe results^of this analysis will temper some of the unfounded 

enthusiasm in support of training children in tjest-taking skills. 
However, it would be unwise to conclude that training in test- 
taking skills^is unwarranted or detrimental. Although the effects 

• of such training are small, the investment is relatively cheap, 
and there is some evidence that for particular groups of children, 
training in test-taking> skills can have substantial effects. 
Those tentative conclusions need further research, but indicate an 



area worth pursuing. 

- r 
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- Table 1 



Characteristics and Conclusions of Previous Reviewers of the 
Effect of Teaching Test-Taking Skills 



Author/year 


1 of experi-- 
mental 
studies 
cited 


Methods for 
selecting 
studies 
specified? 


Previous 
reviewers 
cited and 
critiqued 


Outcomes of 
experimental 
studies cited 

in terms of 


Conclus ions 
about effec- 
tiveness of 
training test- 
taking ski Us 


Variables 
cited which 
covary with 

effect of 
training 


Type of 
studies 
included 


Banger t -Drowns 
et al./1983 

ft 


30 


Yes 




Standardized 
effect size 


Effective 
I? = .25 


Length of train- 
ing program, 
type of training 


Achievement 
tests; elemen- 
tary and secon- 
dary level . 


Ford /1*9 7 3 


24 


No 


No 


Conclusions 


Effective 


None 


Achievement, IQ, 
and apjjtude 
tests; preschool 
through adult 


Fueyo/1977 


19 


No 


. Mo 


Conclusions 


Effective 


None 

* 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 


Jones & Ligon/ 
1981 


r 


No 


No 


Conclusions 


Effective 


Maintenance of 

effect 
Socioeconomic 

status 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 


Sarnacki/1979 


17 


No 


No 


Conclus ions 


Effective 


None 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 


Taylor/1981 


34 


Yes 

i 


Yes 


Standardized 
effect s ize 


; Effective 
ES = .62 

ml 


Type of training, 
uni t of adminisr 
tration, quality 
of study, type of 
test (achievement 
vs. IQ) 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 
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Table 2 



Mean Effect'Size for All Levels of All Coded Variables 

, / ■ 





Adequate validity 
EST SO rc N^ c 


Inadequate vsl idi ty 

E> SDVc Ne- 
ts tS 


All studies . 


.20 .40 55 


.29 .33 £0 


( o t a i sample size small (u-75) 
for study: Medium (76-150) 
Large (150+) 


•32 .28 21 
.11 ' .50 24 
.15 .30 10 


Af\ AC C 

•40 .46 5 
.18 .08 5 


Grade level: lst-3rd 
-4th-6th 


.03 ,51 25 
.33 .39 30 


•14 .06 6 
, .59 .54 3 


Socioeconomic Low 
status level : Not low 


.18 .37 • 37 
.24 .46 18 


.33 .35 8 
.11 .02 . 2 


Use ofifireinforcement* No 
prooej|&£s as part Yes 
of trarffing: 


.22 .40* 48 
-.00* .43 7 


.29 .33 10 


Hours of training: Less than 1 hr 
1 to 3 hrs 
- 4 hrs+ 


.09 .43 ' 14 
.09 .30 22 
.40 .42 19 


.37 .47 5 
.20 .13 4 


Use of practice No 
tests as part of Yes 
training: 


.22 .43 42 
.12 .30 13 


,40 .46\ 5 
J6 ^ .07 \ 4 


Ability level of Mixed 
students: High ability 
Low abi li ty 


.20 .52 47 
.09- .21 3 
>.31 .12 5 


.29 .33 ^ l(/ 


Type of assignment Random 
to groups: .Good matching 
Poor matching 


.27 .39 40 
.24 .01 2 
-.05 .37 13 


.30 .40 7 
.28 .10 3 


Blinding of data Yes 
collector: No 


.13 .44 34 
*.31 .30 21 


.16 .07 4 
s*. 38 .42 6 


Type of outcome measure: 

Achievement test 

Test-wiseness test 

Other (anxiety, self-esteem, 

attitude) 


.10 .33 44 
.71 .57 5 

.44 .36 ' 6 


.29' .33 10 



IT = mean effect size for a particular group. 

SD^s = standard deviation of effect size distribution for a 
particular group. 

N^s * number of effect sizes on which a computation is based. 

Note . Sevferal other variables Including Percent Male, Percent 
^Handicapped, and Percent Minority were coded to determine whether mean 
effect size coyaried with such subject characteristics. Results for those 
variables are riot reported here because of infrequent reporting (e.g., 
Percent Handicapped could only be coded for 2% of the ES's), or lack of 
variance (e.g., 97S of the ES 1 s for Percent Male fell between 47* and 54S) * 
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Table 3 

Mean Effect Sizes on Achievement Test Scores, Broken Down 
by Treatment Length, SES Level, and Grade Level 

•0 





Mean ES 


SD ES 


' n ES 


N Studies 


Less than 4 hours of 
treatment . 


.04 


.30 


18 


7 


4 or more hours of 
treatment 


.29 ^ 


.31 


13 


8 


Low. SES 


.14 


.38 


13 


10 


Not low SES 


.09 . 


.31 


31 ' 


13 


Grades 1-3 


.01 


.37 - 


• 22 


9 


Grades 4-6 


.20 


.26 


22 


9 



^Achievement test scores, studies with adequate validity only. 
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Figure Captions 



Figure 1 . . Mean effect size by treatment length and grade level. 
Figure 1 . Mean effect size by treatment length and SES. 
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(6" ES's from 4 studies) 
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Less than 4 hours 4 or more 
of treatment - hours of treatment 
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30 



.20 



10 



10 



Grades 4-6 



9 ESs from 3 studies 




9 ESs from 5 studies 

4 ESs from 3 Studies 



Grades 1 - 3 



6 ESs from 4 studies 
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tess than 4 hours 
of treatment 



*4 or more 
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IMPROVING THE TEST-TAKING SKILLS OF 
LEARNING-DISABLED STUDENTS 1 

r 

THOMAS E SCRUGGS AND DEBRA TOLFA * 
Utah Stat* University 1 

Summary. — 16 learning-disabled second- and third-grade students were 
matched on previous years' achievement scores and grade and assigned at 
random to experimental and control conditions. Students in the experimental 
condition were given 8 20-min. sessions of training in test-taking skills par- 
ticular to the Stanford Achievement Test. Analysis of test scores indicated 
trained students scored significantly higher on one subtest of a shortened 
version of the test than students who had not been trained*., 

• ( ■ f ^ 

Since the seminal article by MiUman, Bishop, ac«i Ebel in 1965 oa'test^ 
wiseness, or test-taking skills, interest has grown in the construct of test-wise- 
nes$.j& a possible: source of measurement error (5). Although some specific 
groups ancT populations, have been said to be low in "test-wiseness" (9), the 
issue of whether or not students classified as learning disabled exhibit the 
same test-taking skills as. nondisabled peers has only recently been investigated 
(10). Scruggs and Lifson (7) administered reacting comprehension test items 
with accompanying passages deleted to groups of learning-disabled and non- 
disabled students. Their results indicated thatj although nondisabled student^ 
were able to take advantage or prior or partial knowledge jand deductive reason- 
ing strategies to answer most of the questions correctly, learning-disabled stu- 
dents were less able to utilize these strategies. In another investigation (6) 
learning-disabled and nondisabled . students were interviewed regarding , their 
strategies on reading-achievement-test items. Results suggested that learning- 
disabled students -were less likely than their nondisabled peers to apply "ap- 
propriate test-taking strategies" to reading-comprehension-test items and learn- 
ing-disabled students were mc>re likely than nondisabled peers to be misled by 
particular format demands on tests of "word-study skills" (i.e., phonetic anal- 
ysis). . i 

Although the above research indicates that learning-disabled students may 
be lacking with respect to specific tesfe^takijig skills, this research does not in- 
dicate tharythese students can easily be taught these skills to the extent that 
achievement-test performance would improve. In fact, little is knpwn about* 
teaching test-taking skills to learning-disabled students. Recently, Dunn (2) 
successfully taught test-taking skills to' a sample of junior high school-age 

'This research was supported in part by a grant from the Department of Education, Office 
of Special Education, No. G008300008. The authors thank Marilyn Tinnakul and Mary 
Ellen Heiner for their assistance in the preparation of the manuscript. Address requests 
for reprints to Thomas E. Scruggs, Ph.D., UMC 68, Developmental Center for Handi- 
capped Persons, Utah State Unitfersity, Logan, Utah 84322. 
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learning-disabled students, but to date, test-taking skills have noiTbeen taught 
elementary-aged learning-disabled students, llie purpose of the present re- 
search was to determine whether specific test-taking sjkills could be taught to 
' elementary-aged learning-disabled students to improve their performance on 
^standard?zed achievement-test items. 

Method 

Subject? were 16 second- and third-grade learning-disabled students attending 
special education classes in a western metropolitan area. 3 Criteria for placement as 
learning disabled included average intelligence coupled with 40% discrepancy between 
. ability and at least two, areas of academic functioning. Although Itjs were^ not f available 
for this study, all students were said to have been functioning within a normal range of 
intelligence. Students were individually matched on the basis of grade ana" previous 
year's reading test scores and assigned at randofm to either experimental or control groujg. 
Average reading percentile was 29.0 (SD =18.5) for the experimental group and 28.3 
(SD == 19.7) for thfe control group. Average age for each group was 7 yr., 8 mo. (SDs 
8 mo. and 6.5 mo.; "ranges 7 yr. to 8 yr., 4 mo, and 7 yr., 1 /mo. to 8 yr., 6 mo., re- 
spectively, for experiment4 and control groups).* Five (62.5%) second graders and three 
(37.5%) third graders were in each group; the experimental group contained four girls, 
and four boys, while the control group contained 'three girls and five boys. 

Materials were eight scripted lessons for -each grade in a direct-instruction format 
and accompanying workbooks for students which included pencil-and-paper practice 
activities. 3 All items were ^similar to, but not exact items from, the Stanford Achievement 
•Test. The general te&jraking! strategies taught in these materials inducted attending, 
marking answers carefully, choosing the best answer carefully, error-avoidance strategies, 
and appropriate situations for soliciting the teacher's attention. Specific test-taking 
strategies were taught for eadfr^uting subtest in the Stanford Achievement Test. These 
included structured practice on specific test formats for each subtest, and specific applica- 
tion of general test-raking strategies to each specific subtest. For example, with respect 
to the "letter-sound" component of the Word Study Skills subtest, students were taught 
to employ the following sequence of strategies: Look at and read the first word. Pro- 
nounce to yourself and think of the sound of the underlined letter. Carefully look at 
the underlined choices and choose the word with the same sound as the underlined letter. 
If you don't know all the words, read the words you do know or read parts of individual 
words you may know. If you're not sure pf the answer, see if there are some answers 
that you are sure are not correct and eliminate those. Color in the answer quick, dark, 
and inside the line. * Guess if you are fttll not sure; never skip an answer. j 

Experimental subjects were taught in small groups for four 20-min. lessons per week 
for 2 wlc Positive* responding and attention to task were reinforced with stickers. 

The first seven sessions taught- the use of test-taking strategies within the specific 
context of each of the reading-related subtests. The % last session consisted of a general 
review of all previous procedures. Each day of instruction involved extensive work, with 
practice activities applied to practice test items. Students were given no information 
concerning the content of the actual test-not specified in the published test directions. 

-A small group of fourth-grade learning-disabled students was originally intended for 
inclusion in the study but had to be dropped because attrition and methodological 
problems were associated with the test administration for this group. 
T. E. • Scruggs & J. Williams, SUPER SCORE: test-takih* manuals and workbooks. 
(Unpublished training materials, Utah State University, 19847 
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Fcllowing the last training procedure and posttest, all trained and control students were 
administered shortened versions of the reading subtests of the Stanford Achievement Test. 

Item* were, taken from the Primary 2 level, Form E and Primary 3 level, Form E. 
The shortened Version for Primary 2 level included the first 13 items on -the Comprehen- 
sion subtest and the first 16 items on the Word Study Skills subtest The shortened 
version for the Primary 3 level included Items 9 to 22 on the Comprehension subtest and 
Items I- to- 9 and 19 to 32 on the Word Study Skills subtest. The Primary 2 test had a 
total of 13 Comprehension questions and 16 Word Study questions!* while the Primary 3 
test had a total of 14 Comprehension questions and 23 Word Study questions. The 
number, of items was chosen for each condition to represent the number of items expected 
to be completed in 20 rain., according to directions. Although the subtests were shortened 
to accommodate the student's scheduling constraints, standardization procedures were 
adhered to in the administration of the test, which was done in the resource setting by an 
administrator unfamiliar to the students and unaware of group membership of the 
students. Percent correct scores were analyzed instead of mean number correct because 
there was a different total number of items for each subtest and level. 

RESULTS AND DISCUSSION 
Percent correct scores for experimental and control students were com- 
pared statistically by means of t tests for independent means, 4 for Word Study 
Skills, Reading Comprehension, and combined subtests. Descriptively exper-- 
imehtal students scored an average of 77.1% (SD = 13.6), 48.9%* (SD = 
32.3,), and 63.0% (SD = 20.6), for Word Study Skills, Reading Comprehension, 
and combined subtests* respectively. Control students, by contrast, scored 
56.8% (SD = 20.\)/0.3% (SD = 24.3), and 55.4% (SD = 15.1) on the 

• same subtests. TheTohly significant difference between groups was on the 
Word Study Skills suBbcst (/ 14 = 2.38, p = .03). Differences.were not found 
on either the ReadingXIomprehension (t u = —.10) or the total subtest (t u 

. == 1.05) scores. 

It was seen that learning-disabled students trained in test-taking skills 
significantly outperformed their untrained peers on the Word Study Skills sub- 
test but not the Reading Comprehension subtest, of a modified version of the 
Stanford Achievement Test. Although it as not certain wRy performance was 
improved on one subtest but not another, it is possible that performance on 
the Word Study Skills subtest was more easily trained because this subtest con- 
tained several different formats, introduced over a short period of time, which 
may have been confusing to the control students. The resulting effect size of 
this subtest (1.01 SD units) as well as the total score effect size (.65 SD units) 
are substantially larger than those reported in the literature (1, 8) and may 
indicate the deficit in test-taking skills may be somewhat stronger for this 
sample than others as supported by recent research. 

'Since subjects were matched, it is possible to compute / tests for correlated data; this was 
not done here since scores of matched subjects were not correlated on the posttest. 
Corresponding / ratios for correlated data (df = 7) were essentially equivalent at 2.20 
{p = .06), — O.lG, and 1.12 for Word Study Skills, Reading Comprehension, and 
total subtests, respectively. 
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At least some aspects of the training appear to have been effective *!n 
raising test performance; however/ the use of a no-treatment control group 
prohibits drawing conclusions regarding what specific aspects of the training 
were most effective. Further research ^could help clarify. these variables. . 

Although it is true that the use of standardized achievement tests in special 
education is a controversial issue (4), it is also true that it is the obligation 
of special education personnel to maximize the functioning of learning-disabled 
students whenever possible, including performance on standardized acliie^e-^ 
ment tests. It is also true that the skills Aught for use-on the Stanford Achieve- . 
ment Test may be even more valuable for teacJher-rijjkt tests which may con- 
tain evermore cues for the effective use of test-takinfgjpkiUs. Although the 
findings of the present investigation are promising, the small sample and the 
reduced version of the Stanford Achievement Test used as a dependent measure 
indicate that replication of these findings is necessary. <» 
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Abstract 

Fifty-eight third graders froni two elementary school classrooms 
were assigned at random to test-training and placebo groups. 
Students in the test-training group received six sessions of test- 
wiseness training specifically tailored to the Comprehensive 
Test of Basic Skills. Students in the placebo group received six 
sessions of creative writing exercises. The«ef fecti veness of this 
training on achievement test scores was obscured due to the 
presence of ceiling effects. Supplementary analyses, however, 
provided some limited support for the effectiveness of this 
training. Trained and untrained groups were not seen to differ on 
measures -of on-task behavior during the testing situation. An 
analysis of, reported attitudes toward .tests taken immediately 

r.. 

after the three-day testing period suggested that (a) the 
standardized test expejri^fba^a^s a stressful one for control 
subjects, and (b) that the test-wi seness trai ni ng had ( exerted a 
significant ameliorating effect on attitudes in the treatment 
gro'upT" Results suggested that test-wi seness training may reduce 
; levels of anxiety in elementary school children during test 
situations. * 
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' The Effects of Training Tn Test-Taking Skills on 
Test Performance, Attitudes, and On-Task 
Behavior of Elementary School Children 

In recent years, the effectiveness of coaching on 
achievement test performance has been well studied (see Sarnacki, 
1979, and Fueyo, 1976 ^ for reviews)* In a recent meta-analysis, 
Bangert-Drowns, Kulik, and Rulik (1983) determined that coaching 
for achievement tests in the elementary grades produced a, 
generally facilitative effect (average effect size = .29) over all 
studies reviewed. More recently, Scruggs, Bennion,;and White* 
(1984) have argued that although training in test- taking skills 
does often produce an effect in the elementary school grades, this 
effect is dependent upon other factors, for example, length of 
training, age of students, and economic level of the students 
trained. Although researchers in the area of test-wi seness 
training have often examined variables^™ addition r to actual test 
scores such as performance on test-wi seness tests and self-estjeem, 
they have not addressed the issue of whether or not such training 
changes in any way the attitud.es of elementary school children 
toward tests. This in itself could be an important finding for, 
concerning the degree to which school-age children are subjected 
to testing procedures, it would be helpful to ensure that such 
tests were not unnecessarily stressful. In addition, whether or 



\ 
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not training in test-taking skills has a facilitative influence on 
the level of effort the students put into the test situation 
remains unclear. Such' effort may be evaluated by means of the 
amount of time on-task students exhibit during standardized 
testing. 



The present investigation was intended to address som§)Vf 

these issues by providing training in test-taking skills to a 

i 

sample of third grade students and assessing, in addition to test 
performance, reported attitudes towards the testj-taking 
experience and percent of tfme actually spent onj-task during test 
administration. Although the effects of test-wikeness training 
have been wel 1 -documented in the past, the pre^nt investigation,, 
was intended to shed some light on peripheral issues and to 
address more specifically exactly what changes in attention and 
attitude occur as a result of coaching on achievement tests. 



Subjects were 158 elementary-age school children attending the 
third grade in two different classrooms at a western rural school 
district. Sex was evenly distributed. Subjects were selected at 
random from both classes to participate in treatment and placebo- 
groups. 

Materials \ * 

Materials included a manual with six scripted 20- to 30- 
minute lessons in test- taki ng ski 1 1 s specifically tailored to-ttre 




Method 
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reading subtests of the Comprehensive Test of^Basic Skills (CTBS), 
Level E. These materials were developed specifically for this 
project and included student workbooks for practice activities by 
the students (Williams, 1984). > 
Procedure 

Over a t^^week period, treatment students were administered 
six lessons in test-taking skills appropriate to the reading 
subtest of the CTBS, by a trained, outside experimenter. These 
lessons included, for* example, time-using strategies, deductive 
reading strategies, error avoidance strategies, and specific 
practice -activi ties in each of the subtests. To control for 
possible Hawthorne effects, the placebo group w^s given six 
exercises in creative writing by an outside experimenter at the 
same time treatment students were receiving test training. 
Within three, days after the conclusion of training, students were 
given the CTBS by their regular classroom teachers in their 
.regtrtaT i nstructional classes. During the taking of this test, 
observational measures were taken of on- task behavior of students 
by four trained observers unaware of group memberships of the 
students being observed.' The observers employed a time-sampling 
procedure on an interval of 30 seconds. Each student 
observed was observed for 30 minutes. On-task behavior was, 
computed as percentage of times sampled on-task during actual -test 
performance and on-task behavior while directions were being 
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y ' 
given. On-task behavior during directions was^defined as 

orientation of student's eyes toward either teacher or test & 

booklet and penci 1 -and-paper compliance with accompanying sample 

activities. On-task during testing was defined as student's eyes 

directed toward test booklet, pencil in.handL activity marking, 

\ 

reading, or asking teacher direct questions With specific 
reference to the test. After completion of the third and final 
day of testing, students were given an attitude toward tests 
questionnaire (see Figure 1). This questionnaire-consisted of 10 

Insert Figure 1 about here 



items in an agree/disagree format. Students * completed the 
questionnaire together while the teacher read items to the class. 

Results 

Achievement 

Mean scores on the reading subtest of the CTBS were computed 
and compared statistically by means of J: tests. As can be seen in 
Table 1, none of the group differences are statistically 



Insert Table 1 about here 



significant. Interpretation is not possible, however, due to the 
presence of overwhelming ceiling effects exhibited on all 

m 

subtests. 
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A supplementary analysis was conducted on the l^wer half of 
each group chosen by the previous year's total reading scores and 
is given in Table 2. This analysis indicates that standardized 



Insert Table 2 about here* 

gain scores between second and third grade testing were 
significantly higher in favor of the treatment group on Word^ 
Attack Subtest and Total Reading Score. 
On-Task Behavi or ^ 

Mean on-task behavior during directions, during testing, and 
total is given in Table 1. As can be seen, no significant group 
differences were found. • 
Attitudes /toward Tests 

Reliability of the attitude measure was computed by means^of 
a Kuder-Ri chardson 20 formula and was given at .88, indicating a 
moderately strong degree of internal consistency for a measureof 
this type. Difference^ between the mean scores of the two groups 
were nonsi gnifi cant , jt less than 1 in absolute value. An 
inspection of Figure 2, however, shows that the distribution of 
these two groups differs strongly. » These differences are most 



Insert Figure 2 about here m \ 

C\ 
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obvious when one employs a curve-smooj^ftg technique of combining 
the mean scores for each of two adjacent frequencies and are given 
in th^same figure. The differerfce between these dispersions was 
tested statistically J n two ways: mean differences from the mean 
in st andarcj/ scores were computed for subjects in ^ch group and 
compared'Statisti cal ly. The mean distance from -the mean of*the 
placebo group was statistically greater than the average distance 
from'' the mean in the training group (jd < .01). In addition, a 
Kolmogorov-Smirnov two-sample test (Siegel, 1956) was applied to 
each half of the distribution. For the lower half of each 
^distribution {that is, students scoring 0 through 5* on the 
measure), the distributions were statistically different (Z = 
1.529, £ < .02), while the upper half of each distribution was not 
seen to differ significantly (I - .756, £ = • 617) . 

Discussion 

„ The present investigation does not offer conclusive evidence 
that the particular training package employed significantly 
improved test, scores, dud to the ceiling effects reported in the 
Results section. However, it was found that students in* the lower, 
half of the treatment group exhibited statistically higher gain 
scores over the previous year's testing than did the lower half of 
the placebo group. Particularly, this type of training has 
previously been seen to demonstrate a significant effect on a 
subtest similar to the Word Attack subtest in a sample of learning 
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disabled and behaviorally disordered children (Scruggs & 

Mastropieri, in prpss.) £ 

That achievement test coaching results in gpe§Lter levels of 

on-task behavior on the part of students was not supported by thG 

present investigation. Student on-task behaviors while listening. 

to directions and while taking>the test ntself were very similar. 

v 

Analysis of the attitude data did suggest that students in 

/ 

the treatment -group reported more "normal" attitudes than those in 
the placebo gr^oup.. The abnormal distribution of scores in the 
placebo group' is higjilvremi ni scent of that of a population under 
stress (see Wilson, 1973). The fact th-at the abnormally high 

number of very negative attitudes was not present in the treatment 

\ ^ ' V 

condition while the number of .strongly positive attitj* 1 :* was f 

relatively simi lar suggests that this treatment may have 

\ \ • 
contributed "to more positi ve attitudes on the part of those 

students who may otherwise have developed strong negative 

reactions to the test and the^efst- taking situation. It shoFUld be 

noted here that completely positive attitudes toward tests was not 

expected and is not necessarily a realistic expectation. What was 

expected was a roughly normal distribution centering around the 

mean^of about 5, which is in fact the distribution seen in the 

training group. The large proportion of extreme scores in the 

placebo group (with fully two-thirds of the scores. wi thi n 1 point 

of 0 or 10) suggests that the population had been subjected to 



141 



> 



Eff ects of Traini ng 
10 

some stress and had reported widely polarized views on the tes.t- 
taking process. In the training group, these attitudes seemed to 
have been ameliorated substantially. 
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Table 1 

T-Tests by Group 
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CTBS Reading Subtests . 
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T 



2- tail 

Variable NX SD T prob. 



Word attack 



Tx 29 29.79 h , 4.87 



Cx 29 29.72 5.37 
Vocabulary 

Tx 29 26.31 4.58 

Cx 29 26.90 .4.47 
Comprehension 

Tx 29 26.48 4.06 

Cx 29 25.51 5.21 

e 

Total reading 

Tx 29 82.59 12.35 



Cx 29 . 82*14 14.04 
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.49 .624 



.79 .434 



,13 .898 
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Table 1 (continued) 



Vari able. 




SD 



2-tail 
prob. 



CTBS total battery 

Tx .29 150.17 ^\24.68 

Cx . 29 154.03 24.10 

Attitude toward test-taking 
Tx 29 5.59 



-.60 



.549 



Cx 



27 ' 5.04 



2.97 

3 ^ 



.59 



On-ta v sk during directions 

Tx 18 45.28 15.78 

Cx 18 50.06 21.89 

On-task duri ng testi ng 

Tx 18 '77.67 16.18 

Cx ' ^8 77.28 14.98 
Total on-task 

Tx 18 65.78 14.76 



- 7 i. 



557 



.458 



.07 • v .941 



.45 



.656 



Cx 



18 67.78 11.82 
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Table 2 
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Gain Score Differences 


Between 


the Low$r 


Half of Each Group (Chosen 


by Last Year' 


s Total Reading) 




* 










j 










* , 


Vari able 




-A. 

X 


SD .. 


Error T Prob. 


Word attack 










Tx 

- 


12 


25.83 


39.55 


11.42 

2.41 .012 


Cx 


14 


-20.86 


47.06 


12.58 


Vocabulary 










Tx 


12 


18.67 


50.77 


14.66 

• Hy . DC J 


Cx 




7.93 


58.69 


15.69 


Comprehension 










Tx 


12 


53. 17 


37.96 


10.96 

1.46 .158 


Cx 


14 


24.79 


57.54 


15.38 


Total of all 


subtests 








„ \ 


12 


97.67 


52.64 


15.20 

2.51 .019 


J Cx 


14 


11.86 


107.92 


28.84 
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Figure Captions 



Figure 1 , Attitude measure. 
Figure 2 . Distribution of^attitude scores. 
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Circle YES or NO. ^ 



YES 


NO 


1. 


Taking a test is my favorite thing to do 
at school. 


YES 


NO 


2. 


Sometimes I am nervous when I take a 
test. 


YES 


NO 


3. 


I look forward to taking a test. 


YES 


NO 


4. 


I dislike taking a test when I don't know 
the answers . 


».YES 


NO 


5. 


I wish we had fewer tests. 


YES 


; no 


6. 


Taking^ a test is always fun. 


YES 


*' NO 


7. 


I liWe tests even when I don't know the 
answers . 


YES 


NO 


8. 


Taking a test is one of the worst things 
about school . 


YES 


NO 


9. 


I would rather do something else besides 
take a test. 


YES 


NO 


; 1 0 . 


I wish we had more tests. 
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PLACEBO 




O-l 1-2 2-3 3-4 4-5 5"6 6"7 7-# 8-9 9"I0 10 

COMBINED ATTITUDE SCORES 



PLACEBO 

A, 




3 4 5 $ J 8 9 10 

ATTITUDE SCORE 
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Abstract 

Seventy-six third and fourth grade. chi Idren classified as learning 
disables (ID) or behavi orally disordered (BD) were randomly 
assigned to treatment and control groups. Students assigned to 
the treatment condition were taught test-taking skills pertinent 

to reading achievement tests. Students were taught in small 
groups over a two-week period in such strategies as attending to 
appropriate stimuli, marking answers carefully, time using, and 
error avoidance. Following the training procedures, students were 
administered standardized achievement tests in their normal 
classroom assignments. Results indicated that trained students . 
scored significantly higher on the Word Study Skills subtest of* 
the Stanford Achievement Test. Scores on the Reading 
Comprehension subtest were not affected by training. The 
relevance of thess findings to assessment in special education is 
di scussed. 
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Improving the Test-Taking Skills of Behavioral ly Disordered 

and Learning* Disabled Children 
Successful performance in school is to a- great extent 
dependent upon the application of effective learning and problem- 
solving strategies on academic task?. Students are often called 
upon to meet particular format and task demands of academic 
assignments with effective strategies for dealing with these tas-ks 
and successfully completing them. Much of the failure of learning 
disabled (LD) students in school-related tasks has been attributed 
to. a lack of ability in applying such problem-solving strategies 

(Reid & Hresko, 1980). A body of literature has been established 

r 

in recent years which documents difficulties of learning disabled- 
students in employing appropriate learning and problem-solving 
strategies in school. Particular deficits have been note^ in the 
areas of: (a) attending to the critical components of a task 
(Atkinson S^Seun'ath, 1973; Hallahan & Reeve, 1980; Hallahan, 
Kauffman, & Ball, 1973; Ross, 1976; Tarver,< Hallahan, Kauffman, & 
Ball, 1976), (b) selecting a strategy appropriate to addressing a 
particular academic task (Mastropieri , Scruggs, & Levin, in press; 
Torgesen, 1977; Torgesen & Goldman, 1977), and (c) effectively 
employing appropri ate probl em-sol vi ng strategies (Hallahan, 1975; 
Spring & Capps, 1974; Torgeson, Murphy, & Ivey, 1979). 

Given the above documented deficiencies, it would appear that 
one area of particular difficulty for learning disabled and 

i 

\ 
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perhaps other mildly handicapped children would be the attentional 
and problem-solving strategies necessary for successful completion 
of standardized achievement tests. In these group-administered 
tests, learners are typically expected to function individually in 
large-group situations, effectively employ time constraints, and 
develop and employ strategies specifically suited to answering 

1 / 

questions which may be ambiguous or to which the answers are often 
not completely known (Ha'ney & Scott, 1980). Some recent research 
with learning disabled students indicates that these students do,^ 
in fact, exhibit deficiencies with respect to use of effective 
strategies in standardized test-taking situations. Scruggs and 
Lifson (1985) administered questions from standardized reading 
comprehension tests to ID and non-ID students without providing 
the accompanying reading passages. Their results indicated that, 
although non-LD students were able to answer most "reading 

comprehension" /questions without reading the accompanying 

i 

passages, ID students were less successful. This investigation 
reiterated previously asked questions concerning what reading 
comprehension tests actually measure, and^also suggeste^ that many 
ID students may lack some specific test-taking strategies,, such as 
effective use of partial and/or prior knowledge, error avoidance, 
and elimination strategies. Drawing upon a previous investigation 
with mostly nondisabled children (Scruggs, Bennion, & Lifs^ in 
press a), Scruggs, Bennion, and Lifson (in press b) recentlV 
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interviewed learning disabled and nbn-di sabled children wi\h 
respect $o the manner in which they had interpreted and answered 
reading achievement test items. Analysis^of these strategy 
reports indicated that (a) LD students were less likely to select 
and utilize strategies appropriate to different types of test 

questions, and, (b) LD students were more likely to be negatively 
influenced by misleading distractors. Such results suggested that 
learnirKj, disabled and perhaps other mildly handicapped populations 
may have more difficulty than other students adapting to specific 
task and format demands of standardized achievement tests and, 
consequently, --resul ting scores may be less valid estimations of 
potential performance than those of other students. Although any 
observed deficit in "test-taking strategies" on the part of fiiildly 
handicapped children wtfUld be expected to be representative of 
more global problem-solving strategy deficits in school-related^ 
tasks on the whole, it may be possible that specific training in 
test-taking skillsmay^e particularly beneficial to children 
referred for learning and/or behavior problems. Scruggs, -Benni on, 
& Lifson (in press b) hypothesized that, due to differences in 
format and strategy demands, strategies appropriate for word 
analysis* subtests may be n\^re easily" trained than strategies 
appropriate for reading comprehension subtests. 

Previous attempts have been made to improve achievement test 
scores in regular classrooms by coaching in test-taking skills, 
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but the results have been somewhat mixed and seem to have had a 
differential effect on different populations, Scrugjgs, Bennion, 
and* White (in press), in a recent meta-analysis, reported that 
students from the primary grade levels and students from low 
socioeconomic backgrounds tended to diff erenti al ly benefit from 
extended training in test-taking skills. This finding does 
suggest that mildly handicapped students may also benefit from 
instruction i n some of the critical skills they apparently lack 
when confronted with standardized achievement t$sts. 

/ Scruggs (1984) recently reported the training of test-taking 
skills to a small- sample of LD children. After eight training 
sessions had been completed, experimental and control students 
were administered a reduced version of the Stanford Achievement 
Test* (SAT), reading subtests. Results indicated that the 
experimental students gained significantly on a pre-post criterion 
measure of test-taking skills, and scored significantly higher 
(according to a non-parametric test of ranks) on the shortened SAT 
subtests. { Although these results are encouraging, several 
questions remain. Firs\^ could a larger group of mildly 

^handicapped children, including behavioral ly discLr^_ere„cL4^6) 

students, be shown to gain from such training? Second, would this 

* 

training transfer to a standardized administration of the SAT? 
Finally, if this training could be shown to be successful, it 
would be interesting to know the actual size of the effect in 
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percentile points, so that an estimate of the practical importance 
of the treatment could be made. It was the purpose of the present 
investigation to address these issues. 

Method 

Subjects 

Subjects were 76 thi^rd and fourth grade students attending 
resource rooms or self-contained classes in a large western 
metropolitan school district. * Forty students were third graders 
and 36 were attending fourth grade classes; 54 of the subjects 
were boys and 22 were girls. Reading achievement test data are 
given in the "Results" section. Fifty students were classified as 
BD, and 26 students were classified as LD according to federal, 
state, and local school district criteria. For behavioral 
disorders, the definition included students whose behavioral or 
emotional functioning over time adversly affected educational 
performance and required special education service. For learning 
disabilities, the definition included a 40% discrepancy between 
ability and achievement. Although specific academic deficiencies 
were not criteria for BD classification, a separate analysis of 
achievement scores of LD and BD chi Idren 'in this particular 
district indicated that differences in academic achievement 
between the two groups were trivial (Scruggs & Mastropieri, 1984). 
Eighteen students were enrolled in self-contained classes, and 58 
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students were attending resource rooms. Subjects were stratified 
by gradejevel and randomly assigned t^experimental and control 
groups, wjthout regard to category of exieptipnality. \ 
Materials' ' ■ f 

Materials were developed as part of a larger project 
involving 'improving test-taking skills of ID and BP elementary, 
students (Taylor & Scruggs, 1983) and consisted of eight scripted 
lessons for each grade level in a direct instruction format and 
accompanying workbooks for students which included pejrci 1-and- 
paper practice activities (exact materi als- used are given in 
Scruggs & Williams, in press). The general test-taking strategies 
taught in these materials included attending to directions, 
marking ^answers carefully, choosing the best answer carefully, 
error avoidance strategies, and appropriate situations for 
soliciting teacher attention. In addition, specif ic *t$st-taking 
strategies were taught for each reading subtest in the Stanford 
Achievement Test. These included structured practice in specific 
test formats for each subtest and specific application of general 
test-taking strategies to each specific subtest. For example, 
with respect to the letter-sound subtest, students were taught to 
employ the following sequence of strategies: 

1. Read the first word. 

2. Pronounce to yourself and think of the sound of the 
underlined letter. 
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3. ^Carefully look at all the answer choices and choose the 
word witn the same sound as the underlined letter. - 

4. If you don't know all the words, read the words you do 
know, or read parts of individual words that you may kno.w. 

5. If /you are not sure of the answer, see if there are some 

answers that you are sure are/not correct, and eliminate those. 

6. Color in the answer Iquick, dark, and inside the line. 

7. Guess if you are not sure; never skip an answer. 
Procedure' 

Experimental subjects were taught by four trained - 
experimenters in small groups ranging from one to five in size. 
Four 20-3O-minute lessons were given per week for two weeks. 
Positive responding and attention to task were reinforced wvjs'fi 
stickers. Immediately prior to the training sessions, ^nd 
immediately after the last training session, students were 
administered a criterion test of the skills which were taught. 
This test was K a 10- it em test of test-taking skills including 
questions about time using, question asking, and elimination 
strategies. The first seven sessions taught the use of test- 
taking strategies within the specific context of each of the 

c 

reading-related subtests. The last session consisted of a general 
review of all previous procedures. Each day of instruction 
involved extensive work with practice activities applied to 
practice test items. At no time during this training procedure- 
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were- subjects taught. any information concerning the content of the 
te*t which was not givenyn the published test directions. Within 
five days of completion of the training sessions, students were 
administered the Stanford Achievement Test. This administration 
was done in the regular or self-contained classroom settings by 
their regularly assigned teachers. Although teachers were aware 
of the membership of each student iri the experimental group, 
response protocols were scored by machine. Results 

Pre and posttests of the experimental students on the 
criterion measure were compared statistically by means of a 
correlated t test. It was found that the performance on the 
posttest was significantly higher than pretest scores (£ < .01). 
Students scored an average of 40% percent correct on the pretest, 
and 77% correct on the posttest. 

Eight students (5 experimental and 3 control) did not 
complete either or both subtest of the SAT and were excluded from 
further analysis. Experimental students scored an average of the 

25.3 percentile (SD = 20.0) on the Jord Study Skills Subtest and 
the 16.8 percentile (SD = 15.0) on the Reading Comprehension 
Subtest of the SAT. Control subjects scored an average of the 

17.4 percenti le on Word Study Skills (SD = 18.3) and the 16.4 
percentile (SD - 15.0) on Readina^Compr^hensi on. Student 
percentile scores were entere^i^*o alTTgroup) X 2 (subtest) 
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analysis of variance (^NOVA), with repeated measures on the 
subtest variable (Winer, 1971), which yielded significance on 
subtests, F(l,66) = 4.96, p<.03, and group X subtest interaction, 
F(l,66) = 7.06, p < .01. The main overall effect by group was not 
statistically significant, F(l,66) = 1.21, p < .30. Analysis of 

simple effects (Winer, 1971) indicated that experimental and 
control students differed significantly with respect to the Word 
Study Skills subtest, _t(66) = 2.07, p < .05, but not the Reading 
Comprehension subtest, £(66) = -.15, p > ,20, The group X subtest 
interaction is depicted graphically in Figure 1, 

s 

Insert Figure 1 about here 



Discussion 

The analysis of pre and posttest scores indicated that test- 
taking skills could be successfully taught to this sample of 
third and fourth grade mildly handi capped chi Idren. The fact that 
significant gains were made in these critical ski 1 Is suggests that 
mildly handicapped children at this age level do lack certain 
test-making ski lis which are potentially useful in taking 
standardized achievement tests. 

• Analysis of the test data indicated that training in test- 
taking skills did significantly increase scores on the Word Study 
Skills Subtest of the Stanford Achievement Test for this sample of 

/~' h 
ii 

! 
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mildly handicapped students. The overall effect size for this 
investigation, .20, is twice as large as the mean effect size 
found for simi lar investigations with elementary school aged non- 
handicapped children (Scruggs, Bennion, and White, in press), but 
similar to that obtained for primary grade students under 
conditions of extended training (for this age group, an effect 
size of .10 is equivalent to approximately one month of academic 
achievement). The effect size of .43 for the Word Study Skills 
subtest is comparable to the mean effect size found for children 
of low socioeconomic status (SES) under conditions of extended 
training, but mflch higher than mean effect sizes found for higher 
SES children, or lower SES children with shorter training periods 
(Scruggs, Bennion, & White, in press). 

As predicted by recent research (Scruggs, Bennion, & Lifson, 
in press b), performance was increased on the Word Study Skills 
subtest and not the Reading Comprehension subtest. The fact that 
the Word Study Skills subtest was increased significantly may be a 
function of the fact that this particular subtest involves many 
format changes over a short period of time, and thus was more 
amenable to increased performance through guided practice and 
feedback on successful skills necessary for completion of the 
subtest. Strategy deficits previously observed on the Reading 
Comprehensi o$ subtest, however, were not thought to be easily 
remediable. These deficits included ineffective use of deductive 
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reasoning strategies, inability to distinguish between recall and 
inferential questions, and inappropriate levels of confidence in 
answer choices (Scruggs, Bennion, & Lifson, in press b). 

The finding of positive trai ni ng effects replicates that of 
Scruggs (1984), and extends it to- a larger population representing 

different categories of exceptionality on a standardized test 
administration. Although the present results are encouraging, 
several questions remain. First, students in this investigation 
were trained by project personnel in order to insure fidelity of 
treatment. The extent to which teacher implementation would 
effect results is not known. 2 Second, th'e. overall sample size, 
the fact that' subjects were not stratified by category of 
exceptionally,' and the di sproportional ly small number of LD 
students in the present sample did not allow sufficient power 
(Cohen, 1969) to separately assess the effects for LD vs BD 
students, although it may, be interesting to do so t in future 
research. Also, it is not certain which training procedures were 
most responsible for the observed effects. It is likely, however, 
that training in strategies needed for meeting specific format 
demands was more beneficial than the training given in general 
test-taking strategies (e.g c , tirtie-using strategies), for the 
reason that a different effect was observed on the two subtests. 
Finally, the extent to which such training can benefit different 
grade levels and content areas (such as math 2 ) remain to be seen. 
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The present authors are currently investigating such possibilities 
(Taylor & Scruggs, 1983) . 

The usefulness of standardized achievement tests in special 
education has been, and remains, a controversial issue (see SaTvia 
& Ysseldyke, 1981) not intended to be addressed by the results of 
of the present investigation. It must be considered, however, 
that the observed effect (that of raising mean scores from the 
17th to the 25th percentile) could be sufficient to prevent 
special education referral for some students in schools where such 
test scores are weighted heavily. The present authors do not 
subscribe to the notion that special educational services are 
undesirable, and that students should be "saved 11 from them 
whenever possible. It is our view that referral for special 
education services is a serious procedure which must take into 
account many different considerations, both qualitative and 
quantitative, and for which the ultimate goal must be optimal 
educational service delivery for the individual child. If 
standardized achievement tests are to be used for this purpose,' 
then it is important that the score, obtained be as nearly as 
possible a reflection of the child's knowledge of the content area 
being assessed^. To this end, training in test-taking skills may 
be useful. Jhere are other ejids, however, which we feel ought to 
be considered in such training. Since the ski 1 Is trained in the 
present investigation apparently did transfer to a standardized 
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test situation, it seems likely that similar training may 
generalize to other related tasks, e.g., for older students, 
taking a driver's test or an aptitute test relevant to a specific 
employment opportunity. 

Finally, test tak,ing can be viewed simply as a common 

task in todays 1 schools, but not a particularly pleasant 
experience to a mildly handicapped student who typically performs 
poorly, or who does not fully understand testing conventions and 
formats. In this case, training in test-taking skills could be 
regarded as another means to improve the ability of the individual 
child to function in the outside world, a goal' to which all 
special educators aspire. 
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Footnote 

The preparation of this manuscript was supported in part by a 
grant from the Department of Education, Special Education 
Programs, #6008300008, The authors would like to thank Dr. Joyce 
Barnes and the teachers and administrators of the Granite School 

District for their cooperation and assistance. The authors would 
also like to thank Mari lyn Tinnakul and Mary Ellen Heiner for 
their assistance in the preparation of this manuscript, 

*A group of second grade LD and BD students was initially 
intended Y^r inclusion in this study, but was dropped due to 
methodological problems involving sample selection and subject 
attrition. 

^An argument can be made that, since control subjects did not 
receive a 'placebo' treatment (i.e., non-instructional contact 
with the experimenters for an equivalent trial period), the 
observed effects may be due to a reaction to the novelty of 
experimenter -contact and not the training procedure. A decision 
was made not to deliver placebo training to the control group so 
that control subjects would have received additional teacher-led 
instruction as the comparison treatment, and so that their 
instructional time would not have been wasted on non-educational 
treatments. Furthermore, the "novelty" argument seems untenable 
because: (a) a recent meta-analysis by the present authors 
indicated that such subtle treatments were highly unlikely to 
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raise test scores, and (b) such an argument does not explain why 
only one, and no't both, subtest scores were raised. 

3ln fact, a ; question has been raised concerning to what 
extent an^ assessment data are used for making placement 
decisions, (see Ysseldyk?, Algozzine, Richey, & Graden, 1982, for 

a discussion of this issue). 
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Figure 1. Group by subject interaction. 
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Abstract «• , , 

One hundred three regular class and learning disabled (LD) X » 
students were administered three subtests of the Comprehensive 
Test of Basic Skills for which all correct answers had been' 
identified in the student test booklet. Analysis of the compl eted. 

separate answer sheets indicated that LD students answered fewer 
total items than their non-disabled counterparts, bu t di d not 
differ with respect to percent of items answered correctly. In 
addition, descriptive out non-significant differences were found 
for number of answer spaces filled in outside the line. 
Implications for training and assessment are given. 
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Can LD Students Effectively Use 
Separate Answer Sheets? 
Introducti on 

In recent years, research attention has focused upon the 
skills and strategies learning disabled (LD) students apply 

indepenently to test-taking situations (Taylor & Scruggs, 1983) . 

c 

Any observed deficiencies in these "test-taking skills" could be 
considered (a) a potential source of measurem^^ (e,g, ? ^ 
Ebel, 1965), as well as (b) a ^6t^A^y^v^ for needed 
intervention. And, although/ r^srfarch^ has indicated that group^ 
administered achievement" tests are rel iable^ and valid for LD 
students (e.g., Price, 1984), some deficiencies /in 'test-taking 
skills have been observed in this population. Scruggs and Lifson 
(1985) administered reading comprehension questions to LD and 
nondisabled students without providing the accompanying reading 
passages. They found that although nondisabled readers were 
apparently able to make use of such strategies as partial and/or 
prior knowledge, error avoidance, elimination, and use of 
information from other test items, LD students were much less 
successful. Drawing upon a previous investigation with mostly 
nondisabled students (Scruggs,^ Bennion, & Lifson, 1985), Struggs, 
Bennion, and Lifson (.in press) recently interviewed LD and 
nondisabled students concerning the "test-taking strategies" they 
spontaneously employed on reading achievement tests. It was 

i ; 
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• concluded that (a) LD students were less successful at selecting 
strategies appropriate for different types of test questions, and 
(b) LD students were less successful at adapting to novel test 
formats. Given the number and frequency of format changes on 
standardized achievement tests, these factors could exert a 
potentially strong influence on LD students' test performance 
(Tolfa, Scruggs, & Bennion, in press). 

Another important format change which takes place on 
standardized tests after the primary grades, is the inclusion of 
separate answer sheets to facilitate machine scoring. The ability 
to use separate answer* sheets appears to be developmental in 
nature, with students in grades one and two showing better 
performance levels when .test booklets are used as compared with 
separate answer sheets (Ramseyef & Cashen, 1971). Cashen and 
Ramseyer (1969) indicated that the need for use of the test 
booklet marking decreases as the grade level of the student 
increases. Typically, standardized tests begin the use of 
separate answer sheets in grade four. The implications for the 
fourth or fifth grade learning disabled student functioning two 
years behind his peers in perceptual-motor skills become obvious. 

It has been suggested that students can be trained in the 
skill of separate answer sheet usage (McKee, 1967; Ramseyer & 
Cashen, 1971). McKee (1967) described training third graders to 
successfully use separate answer sheets. However, this study 
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represented more of a subjective evaluation than a tightly 
designee research study. Ramseyer and Cashen (1971) concluded 
that' first and second graders were unable to utilize separate 
answer sheets effectively even after practice sessions. Both 
studies (McKee, 1967; Ramseyer & Cashen, 1971) were conducted with 

students functioning in regular classrooms. 

The present investigation examined the effects of separate 
answer sheet usage with fourth grade learning disabled students. 
The study was conducted to determine if, in fact, fourth grade 
learning disabled students were functioning less efficiently than 
their normal ly functioning peers in the use of the separate answer 
sheet, with relative ability to answer test items controlled. 

Method 

Subjects 

Subjects were 103 fourth grade students enrolled in 
elementary schools in a rural university community in northern 
Utah. All students were enrolled in the fourth grade. Nineteen 
of these students (14 boys and 5 girls) were classified as 
learning'disabled according to P.L. 94-142 and Utah State 
guidelines, which include average ability coupled with two years 
discrepancy on standardized achievement tests. Average Wecshler 
Intelligence Scale for Children-Revised (WISC-R) for the LD group 
was 97.94 (SD = 8.81); Average Total Reading grade equivalent 
score from the Woodcock-Johnson was 2,63 (SD = .90) for the LD 
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students* Eighty-four (48 boys and 36 girls) nondisabled students 
were functioning within the regular classroom setting. These 
students were functioning at or near grade level, and had not been 

identified as "gifted," "remedial," or identified for special 

/ 

services of any kind. Averaoe Total Reading grade equivalent from 
the California Test of Basic Skills was 4.^4 (SD = 1.42). 
Materi als 

Experimental materials consisted of the test booklet 
appropriate for the *ourth grade Comprehensive Test of Basic 
Skills (CTBS) and the CTBS fourth grade answer sheet. All correct 
responses had been marked with a black arrow in the test booklet. 
Subtests one, five, affd seven were selected as target subtests. 
All subtests contained 45 questions. A presenter's script was 
prepared 
Procedure 

Nineteen learning disabled students and 84 regular class 
fourth graders were administered the three subtests by one of 
three examiners. Examiners were given a written script to ensure 
all students received the same directions. All students were 
administered the assignment in a group setting with the exception 
of three LD students who were administered the exercise 
individually in their resource room setting. 

Students were told that they would be given a test that 
already had the correct answers marked and that their task was to 
mark the correct answers on the separate answer sheet. They were 
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told to work as quickly and carefully as possible; they would be 
given three minutes to work on each subtest. Students and 
examiners worked the examples together, and examiners checked to 
ensure students were completing the correct subtest sections on 
the answer sheet. 

Answer sheets were scored by recording number of items 
completed, number of items answered correctly, and number of items 
marked outside the established 5 mm radius from the center of each 
answer cjrcle for each subtest. This distance represented the 
point at which the pencil mark could intrude into an adjacent 

4 

answer space. 

Results 

4 

Eachjjubtest was evaluated based on total number of items 
completed, total percent marked correctly, and total percent 
marked outside the circle (i.e., more than 5 mm from the center). 
For total completed, students in the nondisabled group obtained a 
mean score of 96.65 (SD = 18.8), while students in the learning 
disabled group obtained a mean score of 86.2 (SD = 18.0). These 
differences were statistically significant in favor of the 
nondisabled group, jt (99) = 2.19, p = .03. For percent of marked 
items answered correctly, however, differences were not observed. 
Students in the nondisabled group recorded 98% (SD = .06) of their 

0' 

answers correctly, while LD students marked 96% (SD = .13) of 
their answers correctly. Because obtained variance differed for 
the two groups (p < .01), a separate variance estimate was used, 
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with a correction for degrees of freedom (Ferguson, 1982) which 
yielded a t(20) ■ .61 , p ■ .55. f 

In addition, a^descriptive, non-significant difference was 
found when groups were compared with respect to percent of answer 
spaces marked outside tbe'line, ;t(21) 1.71, p = .10 (separate 
variance estimate). Descriptively, the nondisabled group marked 
an average percent of 7.8 (SD = 8.6) answers outside the line, 
while the LD sample marled an average percent of 13.0 (SD = 12.7) 
answers outside the line, assessed as a function of total number 
of answers marked. 



LD students were seen to differ significantly from 
nondisabled students with respect to ability to utilize a separate 
answer sheet in answering standardized achievement test questions. 
These differences were most pronounced//n the area of speed and 
less pronounced in t^^rea^trf^ac curacy and neatness, although 
descriptive differences were also found in these areas. The 
present data strongly suggested that the achievement test 
performance of LD students may be differentially hampered in 
performance by separate answer sheets, resulting in increased 
measurement error. Further research is needed, however, to 
document the exact extent performance may be inhibited under 
standardized test administration conditions. 

Two possible interventions can be imagined to help correct 
such possible difficulties: One possibility is to modify the 
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tests themselves, while the other possibility is to train ID 
students to be more efficient with separate answer sheets. And, 
in fact, such procedures have recently received attention in the 
research literature. Beattie, Grise, and Algozzine (1982) 
assessed the effectiveness of several test modifications, 

including imbedding the answer circle within the test booklet, on 
the competency test performance of LD students. Although some 
descriptive advantages were noted, the overall modifications 
failed to produce any strong consistent effect. With respect to 
the second possibility, attempts to train LD children in use of 
novel test formats, including separate answer sheets, have been 
successful. Scruggs and Tolfa '(1985) and Scruggs and Mastropieri 
(in press), reported successfully teaching such 'test-taking 
skills' to LD students, to the ektent that test performance, 
subsequent to training, was significantly higher than that of 
untrained controls. The fact that resulting effect sizes in these 
investigations were higher than those usually reported in th'e 

i ! 

literature (Scruggs, Bennion, & White, in^press) supports the 

> 

notion that LD students may indeed demonstrate relative deficits 
in a variety of 'test-taking skills' (Scruggs & Lifson, in press). 
Further research can do much to further describe the nature of 
such deficits, and develop effective means of remediation. The 
present authors are, in fact, currently engaged in such an effort 
(Taylor & Scruggs, 1983) . 

t '± 
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Abstract 

Ninety-six behaviorally disordered and more ^average students were 
administered a test attitude survey immediately after district- 
wide standardized achievement testing. Results were consistent 
with previous research which suggested behaviorally disordered 
students may report lower attitudes than their more typical peers. 
In addition, differentially lower scores were found for 
behaviorally disordered girls, while no sex differences were found 
in the more average group* 
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- Attitudes of Behavi orally Disordered Students 
toward Tests: A Replication 
The behavi orally disordered student is thus classified based 
on average or near coverage intellectual , ability in addition to 
social or ohqtional (functioning that is substantially different 
from other stude^ts^ the same age. Behaviorally disordered 
students have repeatedly shown academic deficiencies (Mastropieri, 
Jenkins, & Scruggs, 1^85; Motto & Wilkins, 1968;; Stone & Rowley, 
1964). Several variables*, including attitude toward school 
subjects (Silberberg & ^Silberberg, .1971) , impulsivity (Letteri, 
1979), and responses tolard test-taking situations (Forness & 
Dvorak, 1982; Scruggs &\ Mastropi eri , in press; Scruggs, 
Mastropieri, Tolfa, & Jenkins, 1985), have been identified as 
possible contributing factors to academic deficiencies. 



The present study investigates the behavi oral ly di sordered 



student's attitude toward Itest-taking si tuations. In the Scruggs 



et al. (1985) study, conflicting results were found. In SJtudy 1, 
responses of fifth and sixth grade behaviorally .disordered 
students were compared withj those of their normally functioning 
peers on a 12-item test-attitude survey. Results indicated that 

i 

the behaviorally di sordered istudents differed significantly from 
their normally functioning pjeers on the overall survey as well as 
the specific factors involvihg subjective feelings about tests and 
feelings about the personal importance of tests. Groups did not 
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differ with respect to evaluation of the objective value of tests. 

The sample in this study was relatively small (N = 37), however, 

and the survey contained too fevM terns to draw firm conclusions. 

In Study 2 of the same investigation, 75 "regular classroom 

students and 25 self-contained behavi orally disordered students 

were administered a longer test attitude survey. Groups, which 

were equivalent with respect to number, age, sex, and grade, were 

then compared. There was flo difference between groups on the 

total survey, or on "personal feeling" items, or on "value of 

tests 11 items. Scruggs et a'L (1985) proposed several possible 

explanations for/these discrepant findings^ including that fact 

that Study 2 was conducted at the beginning of the school year 

when students had not had much recent experience with test-taking, 

while Study 1 was conducted at the end of the previous school year 
« ■ 

after students, had recently experienced testing situations. 

The present investigation was conducted to, help clarify the 
conflicting results of the Scruggs et al - (1985) investigation. A 
larger population, including a greater number of grade levels, was 
compared on a revised version of the test attitude survey utilized 
in Study 2 ,of the Scruggs et al. (1985) investigation. In 
'addition^ a larger ' sample^ of girls' was employed in the present 
investigation so that an evaluation of possible group by sex 
interaction effects could be made. 
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» • Method * 

' • Subjects ^ v : 

0 * Subjects were 96 elementary school children attending* a 
public school in a western metropolitan community. Students were 
enrolled in grades one through si*. Forty-eight of these students 



were classified as frehavi orally disordered, while 48 ji&re more, 
typical students enrolled jn -regular classrooms in th^ same 

\ . ' ■ . * i 1 - • 

school. To be included in thje study from the regular classroom*, 

students were -selected at random, usyigja stratified random 

* ... * 

sampling technique, frdm a population of 122 students representing 

the same grade -levels. When 'possi ble, equal numbers of boys and 

i girls per grade level were selettjatt to match numbers .represented 

in the target population. The breakdowft by grade lev61 -and sex 

L 

for each grott^was a* follows: three students (1 boy, Z girls) 
were enrolled in first grade, eight students (j/boys,- 3 ■'g.irls) in 
second grade, four students (all boys) in third gr&de, eight' 




students (6 boys, 2 girls) in fourth grade, eleven students 

(behavioral ly disordered = 9 boys, 2 girls; regular class 6 
v „ \ 

boys,' 5 girls) in fifth grade, and fourteen '-students (behavi oral ly 
disordered = 11 boys, 3 girls; regular = 9 boys, Skirls) were 
enrolled in sixth gVade. ^. 
'\ . Students were identified as behavi orally disordered according 
to* state and P.L! 94-142Jguidelines, which included students 5 
exhibiting- behavior or "motional con-duct over, time which adversely 
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affected educational performance, and required special education « 
services in self-contained classrooms. 
Materials and Procedure 

Tjjre 17-ttem Test Attitude Survey was constructed based on 

rpsulvts from previous investigations which also examined test- x 

* t * 

taking, attitudes of students '(Scruggs, BerTnion, & Williams,- 1985; 

« * 
Scruggs, Mastropieri, Tolf$, & Jenkins, 1985) and contained such 

items as "tests sare an important part of school', 11 "tests are more 

^ important to the teacher than to me," "tests are a waste of time," 

"I try my. best when I take a test," and *!l do poorly on tests," 

Ireljs were intended to reveal students' feelings of vthe importance 

V of tests to themselves and to parents and teachers, as well as 



their own feelings ^bward tests. 

The measure 'was administered immediately subsequent to yearly 

achievement testing. Administration pf the survey was conducted 

in the students', regular classroom, and items wer^ answered 

together as the teacher rgad each item aloud. Students were given 

1 point for a posi ti ve response (i.e., "yes" to a positive 

/'"^statement, 'or "no" to a negative statement) atod 0 points for a 

i>* J 

negative response. ✓ ° * v 

41 r , - 

w ' ^ Results • 

The reliability (Kuder-Richardson 20Tof the present survey 
for this sample was .81 , Jtohicft^was slightly higher than that of 
previous coefficients of .76 and .75 (Scruggs et al., 1985). 
Response data were entered into a 2 (group) x 2 (sejp analysis of 
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variance (ANOVA), and yielded significance for groups, F(l,92)-» 
19.73, p < .001.. >^No significant main effect was found for sex, 
F(l,92) = 2.46, p = .12/ Finally, the 'interaction of group by sex> 
was seen to closely approach significance, F(l, 92) = 3.59, 
p = .06. Follow.-up_jL=iests indicated tt.at girls in the 
behaviorally disx>rcfered groups reported differentially lower 
ajttitudes (tft6] = 3.56, p < .001), whi le -boys -and girls in the _ 
more average group did got differ ( t < 1). Descriptively, the 
more average group reported more positive attitudes than the 
behavioraMy disordered,, group, with mean stores of 14.56 (SD.= 

- ' - . * - i : 

2,03) and 12.15 (SD = 3.69), respectively. Sex by group 
differences are depicted graphically in Figure 1. 



Insert Figure 1 about here 



A (factory analysis of responses .for the total .group was 
calculated using the same procedures as in <the Scruggs et al. 
(1985); investigation.^ In this /analysis, However, meaningfal 
factors consisting of more than two or three- items were v not 
identifiable. This finding was inconsistent with that of Scruggs 
et al;. (1985, $tudy 1), which identified three distinct factors: 
(a) personal importance of tests, (b) objective worth of tests, 
and (c) personal feelings about tests. ' ' j 

' Discussion 

The present investigation replicated the findings of Study 1 
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in Scruggs/ et al . (19,85), and suggested that behavioral \y 

« •«..'■■ * 

^disordered children do report different, -less positive-attitudes, 

■toward test-taking situations than their more normally functioning 

peer^. This study also expanded previous findings to include 

grades one through six. ^ a % 

Although -sample size and I matching procedures more -closely 

paral leled Experiment 2 of the Scruggs et al. (1985) * 

investigation, the findings between 'that study and the present 
' \ 

investigation were in opposition. - This nray suggest that the time 
of th§ school year influenced students' responses. While Study 2 
.in^he Scruggs et al.Jtf.985) investigation was* conducted during 

the beginning of the year- whervs^tudents had not recently undergone 

*■■■». 

i testing, ttie present study was conducted following the yearly 
administration of the standardized ^tests. The exposure to the 

* i - d ' > 

testing situation may have given stuXerits a more realistic outlook 
"on their rest-taking attitude. 

* - r <* ' u • . . i 

. Finally", although tire sex "by grade interaction was not 
significant by conventional ^standards, the effect was sufficiently^ 
tangible to warrant further investigatJ^n'SJr 

Tfiese results, suggest that -behavioral Ty disordered students 
do differ from ttjeir normally functioning peers on test^taking 
attitudes. Further research could do muclh* to clarify any possible 

* » 

•causal relation between tes-t scores an test attitudes of ^ 
behav'i orally disordered 'students. 
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. Figure Caption 

Figure 1: Sex by group interaction., 
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, Abstract 
Eighty-five mildly handicapped (learning disabled pr behaviorally 
disordered) students were assigned at random to ei ther' a control 
condition or a condition in which students received five days 1 
training on test-taking skills relevant to the Stanford 
Achievement Test. Results of test scores indicated that trained 
students scored significantly higher on.tests'of readi ng decodi ng^ 
and math concepts. A significant interaction between Experimental 
group and handicapping condition revealed that students classified 
J as behaviorally disordered had differentially benefited on the 
math concepts subtest. Finally, a descriptive 'bu^, non-significant 
difference favoring trained students was found^on the math ^ 
computation subtest. 
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The Effects of Coaching on the Standardized Test 
Performance of Mildly Handicapped Students 
In recent years > researchers have attempted t*o identify * 
sources of measurement error in handicapped populations.' Such 
research is of importance because. handicapped children are often 

amurg those most frequently tested in public schools, and because, 
these populations have often beert underrepresented in test *i 
standardization procedures (Fuchs, Fuchs, Dai ley, & Power, . 19§5)fc 
Testing influences research has generally focused on the following 
issues: examiner effects, test anxiety and attitudes, and * . 

test-taking skills, or - M test-wi seness " (Millman, "Bishop, & Ebel, 
1965). 

Fuchs, Fuchs, Power, and Dai ley (in press) tested handicapped 
(speech or language' impaired) and nonhandicapped children using 
familiar and unfamiliar examiners and concl uded. that examiner 
familiarity had a differentially facilitating effect on handicapped 
children. This finding is supported by previous research efforts 
(Fuchs, Fuchs, Dai^ly, & Power, 1985; Fuchs, Fuchs, Garwick, & 
Featherstone, 1983). Field (1981), however, found examiner 
familiarity or recent experience with nonhandicapped children had 
a negative effect on the test scores of developmental ly handicapped 
preschool children. Dangel (1972) examirwdHihe influence of 
pretest referral information provided to examiners (examiner bias) 
on the intelligence scores of retarded students and reported that 
scores did not differ as a function of examiner bias. 
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Test anxiety and test attitudes have also. been recently 

: ■ v 

investigated with handicapped populations, but findings here, have 
not always been in agreement. Bryan, Sonnefeld, and Grabowski 
(1983) reported that learning disabled (ID) students were more 
"test-anxi ous than their rrondisabled counterparts, while SU^a 
(1977) found no such relation. Wolf (1975) reported that 
X,^^ anxiety-reduction training had no effect on the performance of 
"test-anxious" behaviorally disordered (BD) boys.- Finally, 

Scruggs, Mastropieri , Trilf a, and Jenkins (1985), ancjxfolfa and 

* " > ' S 

Scruggs (1985a) found that BD students reported mor\e negative 

' attitudes toward tests than their mo/e averageuage p^s 11 

0 In the area of tesf-taking skills, recent' research has 

supported the notion that mildly handicapped (particularly LD) 



students exhibit deficiencies in this areaWith respect to 
standardized achievement tests. LD student! have been shown to 
exhibit deficiencies in the use of prior knowledge and deductive 
reasoning strategies (Scruggs & Lif son, 1985) , flection of 
appropriate strategic and attention to appropriate formal^ 
features (Scruggs, Bennion, & tif.son, 1985>in press), and 
effective use of separate.answer sheets (Tolfa/& Scruggs 1985b). 
Although standardized achievement tests have generally been found 
to be reliable and valid with'mildly h^pdi capped students (e.g., 
Pierce, 1984), results of thfe above test-taking skflls research 
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suggest that measurement error could be reduced (and consequently, 

scores improved) if mildly hagdrcapped students could be 

successfully trained in "test-taking skills. 11 

Much research has been conducted in the area of training in 

test-taking skills, but little of this research has addressed 

handicapped populations.- In a recent meta-analysis, Scruggs, 

% 

s 

Bennion, and White (in^press) examined the effects of such 
coaching on achievement test scores of elementary school children. 
They concluded that, in general, coaching had a very small overall 

effect on test scopes, with somewha^larger effects b&ing vound 

■ * 

for younger students, lower SES students, and students vyho had 
undergone longer training periods. No research wa$ ]ocated in 
which mildly handicapped; students had been trai^fd, although, more 
recently, such training has been accomplished. Scruggs and Tolfa 
(1985) reported that a small sample o\trained LD students had 
scored higher than controls on standardized word analysis test 
items, while no differences were found for reading comprehension 
items. These same f indings*were replicated by*Scruggs and 
'MastropieH (in press) using a larger "subject sample of LD and BD 
students. It was concluded that such training could have a strong* 
facilitative effect (8-10 percentile points) on reading subtests 
with more complicated format demands, as suggested by Tolfa, 
Scruggs, and B.ertnion (in press). The findings of Scridgs anjj 



Mastropieri (in press) and Scruggs and Tolfa (1985), although 
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encouraging, left several issues jjn addressed. ; First, the subjects 

' • * * V " 

in these investigations were mostly primary level students 

generally less familiar with teisti'ng situations than older 

students. It would be of interest to know whether upper ^ 

elementary "students could benefit from 'such training* Second, 

v 

training was only given in reading subtest areas, leaving open the 

( * 

question of whether such training" could facilitate performance. on 
mathematics subtests. Finally, only the Scruggs and Mastropieri. 
(in press) investigation -include^ JJD students, and in that study, 
^students w^re not stratified by handicapping condition and' \ 
therefore analysis <of any possible treatment by handicapping 
condition interaction was not^possibfe. It was, therefore, the 
puPpos^of the present research to replicate and- extend previous, 
findings of trairiing in test-taking s v ki 11 s to include (a) upper 
elementary students, (b) mathematics- as well as readin-g subtests, 
-and (c) separate analysis of test*, performance by different 
handicapping condition. 

\ MethtR P, 



Subjects ^ ' 

S^^cts w.ere 85 LD and BD students attending public schools 
in a western metropolitan area. Forty-four students had been 
classified as learning di sabled and 41 students, had been 
classified behavioral ly di-s^rder^d by national, state, and local 
standards. These standards included, for L'D-students , a forty 
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'percent discrepancy between ability (assessed by individual • 
intelligence tests) and two areas of" academic vachievement. 
Although LD students ir> the present sampl e'exhibi ted descrepancies 
in several different content areas, most had been referred for 
deficiencies lin readjj^^Toli^ed by cieficiences in mathematics 

functioning. Beh^vi orally disordered students, were classified by 
teacher and schootl/psydhologi st documentation of deficiencies in 
social or. emotiona\f'Unctioning.yh1ch5 -interfered \\ th classroom 
learning. These refer raTsnffere made for several different 
reasons, but in most cases students had exhibited aggressive or 
non-compliant behaviors in the classroom which, interfered with 
routine classroom activities. 

ThS^ampie included 21 4th, 38 5th, and 26 6th grade 
students, composed of 63 boys and 22 girls. Mean Weschler 
' Intelligence'ScaTe for Children-Revise^ for the experimental group 

*was 92.45 (SD = 10.20).' Mean WISC-R for the control groups was 
. 91.48 '(SD = 9.64). Achievement tes£ scores for the sample are 
provided tn the Results section. 
Mater i al 9» > * • * . 

Materials V/ere developed specifically for the present 
investigation and consisted of (a) a practite test booklet with 
correct answers identified for practice with separate answer 

'sheet, and (b) L a practice test booklet with unmarked problems 

A < 

similar to', but not identical to, items in the Stanford^./ 
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Achievement Test: Items were included which resembled those in 
two reading subtests (comprehension, word study skills) and three 
m subtests (concepts, computation ^and word prptilems). 

< ' ' : > - ■• J • 



ma 

Procedure 



SfuSents were stratified by gr£de level and handicapping 

condition, and assigned at random to either a- trailing or a 

no-treatment control condition. Trainjjig cpndition students were 

^ seen in Small '(1-6) groups by one" of" thr^ee trained experimenters/ 

t 

for five 20-30 minute sessions,' - in the first session, students 

Wjere g^ven instruction and j^r actice^tn the use of separate answer 

sheets using a practice test book let*f or -which correct items had 

been indi cated wi th an 'arrow. Students were instructed in finding 

$njA monitoring their place on the answer sheet, marking and : 

erasing carefully, and .in crhecking their work. The second and 

third consisted^ training in readirig*subtesfs. For ttie reading 

comprehension subtest, students were taught to refer back to the 

passage ^or recall questions, to use deductive reasoning ■ 

strategies for inference questions, and to look for similarities , 

^between phVases or words^in the passage ind answer choices. For 

' < . ... « 

'the word .study skills subtest students were taught to* attend to 

appropriate cues and sound, rather than letter similarities in 

stem and^option. Fpr the math concepts subtests, students were 

taught to attend carefully to format changes. For . the computation 

Subtest, students were taught to carefully recopy problems on 
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scratch paper in the most familiar form and more neatly. Finally, 

on the word probl ems subtest, stents were taujgh| to 'attend- to * 

command words in the problem and work v problems > carefully on * 

separate .pape^v-^)n all subtest!?, students were £aught to (a) work 

quickly and carefully, (b) check\ answers if time permits, (c) 

answer all questions, (d) eliminate answers know"n to be in'correct, 

* . ' ' i * k ■ 

(e) incorporate prior or partial /knowledge, and (f) become 

r 

familiar with all subtest format demands. - / 

The next week after training, all students were administered • 

the Stanford Achievement Test by regul ar school personnel. 

Completed answer sheets were machine scored. 4 

Results - «. 

Percentile scores were chosen for the present analysis* 

because of their consistency across grade levels and because, of 

1 - * 

their meaningf ulness . Since preVi ous^ research has indjcated* * * 
different effects are found for different subtests, separate 
condition (training vs. control) by handicap (ID vs.'BD) analyses 
of variance (ANOVAS) were computed for each subtest. . Significant 
^fferences were found for the word study ski, Ms and ma^h concepts ; 
subtests, in favor of the training condition. On the Word study % 
skills subtest control students scored at an average of the 17.5th 
percenti le.,^while trained students scored at -an average of the \i 
26.4th percentile, F(l,81) = 4.79, p = .03. No significant 
differences were founci for handicapping condition, F(l,81) = 1.53, 
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p < -56 (MS e = 361.2). On tire math concepts subtest, control 
students scored at an average- of tha.A6.4th percentile, while 

• «. • ■* 

training condition students scorejjj, at 'an average of the 24.1st 
percentile, F('l,81),= 4.54, p «' .04 (MS e = 288.3). No . - 
significant difference was found for handicapping condition, 

T(l,8i) *1.14, p a .29, but an interaction Effect was noted, . - 
F(l,81) = 4.-58, p = .04, indicating differential facilitation on 
^ the part' of the.BD students. This interaction is depicted* 
graphically as figure 1, 4 * 



Insert Figure 1 about he**e 



experimental condi ti on 



Additionally, the rfiain effect for experimental condition wut 

not'handicap oY interaction) approached significance on the 
\ x 

mathematics computation subtlest, F(l,81) = 2.57, p - .It. 

• - i 

.Descriptively, trained students scored at tiie 21.5th percentile . 

while control students scored at the 15.5th percentile 

* " * • * 

> (MS e = 284.3). Main effects or interactions did not approach - * , 

significance on the reading comprehension or the math applications 

' " ■ ~ T ' 

subtest (all Fs < 1). Descriptively, differences by condition 

•V * 

1 were^egligible, with experimental vs. control mean percentiles of 

\ • 

19.0 and 17 .71 "respecti vely, for reading comprehension; and 23. $ 
and 20.7 for rn^th applications. In both cases, however, 
descripti ve^^VP^erences favored trai ni ng condi ti on students*: 
Obtained effect sizes for all subtests are given in Table 1. 
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Insert Table* 1 about here 

: 

Discussion 



* The findings of the presejnt investigati on ,repl icate ; the 
findings of Scruggs and Tolfa (1985) and Scruggs and Mastropieri 
(in press) and extend thenTinto upper elementary grades, 
mathematics subtests, and allow comparison of LD vs. BD student 
performance. That traineti students outperfoYmed controls on word 
study skills and, mathematics' concepts subtests supports the " * 
hypothesis of Tolfa, Scruggs, and BehJion (impress) that tests^ 
'with more complicated formats may prove differentially difficult 

- V . . 

for mildly handicapped students. That is, th|e word study skills* 
and Mathematics concepts subtests each contain several/ potential ly 
confusing .format changes, while 'reading comprehension and math 
applications (i.e.y^word problems) subtests contain more ^obv^ous" 

format demands, fewer format changes. 'Although significant 

si 

main effects were not' found for total reading, totaVmath, and^ 
total te^t, resulting effect sizes of these scores were^ 
substantially higher than those reported in the literature for 
nonhandi capped children (Scruggs, Bennion', & White, in press). 

The obtained interaction by jiandi capping condition on the 
mathematics concepts subtest may simply represent characteristics 
ot^The present sample, but certainly deserves further research 
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attention. Si'nce 'mathematics functioning has been noted as a 
particular area of difficulty for BD students (Mastropi eri , 
Jenkins, SrScruggs, in press>, perhaps reflecting problems with 
attention and persi stance of e£Fort,|it is possible tha£ training 
in this case lessened* the need to understand formats and .thus 
^represented a more valid indication of actual ability. 

The resul ts of "this and previous research indicate- that 
test-taking skilH can be trained to mildly handicapped elementary 
age students, and that this trainingxan significantly impact on 
test performance. Future research efforts are'needed to 'assess 



whether similar traiVjng can also benef i t seconbarjy level mildly 
handicapped students, and whether train irtg can improve scores" on 
teacher-made tests. The preserit authors are currently. 



investigating such possibilities (Taylor & Scruggs, 1983). 
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Table 1 

Obtained Effect Si 



Subtest 



v 

Reading Comprehension 
Word Study Ski 111 
Math Concepts 
Math Computation 
Math Applications 
Total Reading 
Total Math 
Total Test 



Test Performance 
18 



Effect Size* 



.10 
.53 
59 
.47 
.15 
.40 
.47 
.36 , 



*A11 effect sizes weVe computed using control- standard deviation 

as divisor and E-C mean differences in the numerator. 
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figure Caption 

Fi gure 1 : Condition by handicap interaction: Math concepts, 
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« 

Abstract 

The popular conception of test-wi£eness is reviewed and evaluated. 
Although -some support for the concept of test-wissness exists, in 
general the influence of test-wiseness with respect to: (a) " 
contribution to measurement error, (b) cultural differences, (c) 
independence from general intelligence, and (dy facility for 
training, has been greatly overestimated. This paper attempts to 
place commonly found statements regarding tesi-wiseness in 



Respective of actual research findings; k 
f 
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Current Conceptions of Test-Wiseness: 
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Myths and Realities, 




- 


It has been known for many years that all test scores reflect 






two additive elements: J 1 true" score, accounting for the construct 


f 




being measured,"and "error" score (Magnusson, 1967). It has also 




> 


.been suggested that the error score may be itself composed of ■ 


m 


r 


several additive components (Ebel & Damrin, 1960; Thorndike, 
1951) • These components have been said to include test anxiety 






(e.g., ' Sarason, 1978), achievement motivation (e.g., Atkinson, 
1974; Chapman & Hi 11," 1971), arid, ^elf -esteem (e.g., Roen, 1960). 


> 




Such possible elements of measurement error have : been discussed in 






detail by Jensen ,(1980). N • 

f 1 

Since 1965, an additional construct has been dismissed 

repeatedly in th£ literature which is commonly thought to involve 


• ■■. / • 




a substantial source of measurement error. This construct was 
def ine^ by Mi llman, Bishop, and Ebel (1965), as "test-wiseness" 
•(TW). Mi/Tlman iat al. defined TW as "a subject's capacity to 


V , 




utilize the characteristics and formats of^the test and/or the * * 






t^t-taking situation to receive a hi gh score" (p. 707). They 






further described TW as *" logi cal ly independent of the examinee's 






knowledge of the subject matter for which the items are supposedly 




• 


measures" Tpv 707). Ebel (1965) has suggested that error in 


— - 


< ' 


measurement is more likply to 'be obtained from students low in 

• . V . \< V . * . .. ' , 
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- 4 
test-wiseness. The student low in TW, therefore, may be more of a 
measurement problem' than the student high in TW (Slakter, Koehler, 
& Hampton, 1970). i * 

, Analysi s and Measurement of TW 
MiHman^ishop, and Ebel (1965) have provided a definition 
and- analysis,-^ the .construct on which most subsequent research 
'has been based (Sarnacki , 1979). Millman et al. defined TW as 
distinct from. general mental attitudes such as confidence and 
anxiety, and motivational states of the test-taker. In their 
analysis of TW, six elements were delineated. Four of these 
elements were considered to be independent of the test constructor 
or test purpose^ while two were considered to be dependent on test 
constructor (or test purpose. The four independent elements 
included (a) time using strategies, (b) error avoidance 
strategies,* (c) guessing strategies, and (d) deductive reasoning 
strategies. Time using strategies included working quickly and 
efficiently and saving more difficult or time-consuming items for 
last.^ Error avoidance strategies included attending to 
directions, marking answers carefully, and checking all answers. 
Guessing strategies were considered to be the use of guessing when 
it was likely to benefit the test-taker. Deductive reasoning 
strategies included elimination of items known to be incorrect, 
item choices based on an analysis of <the relation among, items, 

such as choosing neither of two items which imply the correctness 

/ 
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of each other (similar options), and use of Content information 

from other test items and options. 

The two elements thought to be dependent upon test 

constructor or purpose were intent consideration strategies and 

cue-using strategies. Intent consideration strategies included 

\ - / 
adopting the appropriate level of sophistication for the test, and 

considering the purpose of the test constructor. Cue-using 

f - 

strategies referred to the use of any consistent idiosyncrasies of 
the particiBjar test constructor, such as inclusion of more true or 
false statements, placement of correct distractor, and grammatical 
inconsistencies between stem and options. Avoidance^gf ite ; ms 
using the words "always" and "never" (specific determiners) was 
also considered a cue-using strategy. v 

Researchers have typically assessed TW in one of tW'i-ndirect 

Hi 

ways. One method is to teach TW sk^Hls to a population and assess 
the extent to which scores improve. The other method is to 
construct questions which are answerable only by use of specific 
TW skills and limbed these items in a*larger test of answerable 
items. An example of an item answerable in tlrms of a TW strategy 
(similar options) was given by^Slakter, Koehler y ' and tlampton 
(1970, p. 249): J j 
"When Bestor cyrstals are added to water: 
1. Heat is given off; v 



2. The temperature of the solution rises; 
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3. The.solution turns blue; 

( \ 

4. The container becomes warmler." 



The keyed answer to this item is (2), since the other options 
imply the correctness of each other. In a similar fashion, ^ 
guessing strategies have been assessed by indicating a penalty for 
incorrect responses, and imbedding nonsense items for which no 
answer is correct. . The extent to which subjects answer such 
nonsense items was considered -a measure of guessing strategies 
(Slakter et al., 19/0). Finally,- such general Twj strategies as 
use of prior or partial knowledge, deductive reasoning, and use of 
• prior items have been assessed by administering reading 
comprehension test questions for which the referent reading 
passages have been deleted (e.g., Dunn, 1981; Scruggs & Lifson, 
1985). * ; 

Since the initial analysis by Millman et*al. (1965), a 
voluminous literature- 'has emerged, reviews of which have been 
written by Bangert-Drowns, Kulik and Kulik (1983)', Ford (1973), 
Fueyo (W77) Jones and ligon (1981), and Sarnacki (1979). These 
revievys are all thorough ^to the extent that they cover adequately 
the body of literature referring to TW asMt has- been evaluated 
over the past two decades. It is the view of the present authors, 
however., that much of the influence associated with TW has been 
overstated to the point of distortion. It is the purpose of the 
present paper to clarify some issues regarding the construct 
"test-wiseness 11 and its consequences. 
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Commonly made statements regarditig TW which are considered to 

v 

be "myths' 1 (by the present authors) include the following: (a) 
there is no substantial correlation between test-wi seness and 
intelligence, (b) TW constitutes a large source of variance which 
is commonly found in tests, (c) different American cultural groups 
are seen to differ substantially with respect to test-wiseness, 
and (d) test-wiseness is easily trained and results in substantial 
increases invest scores. These "myths" will be considered 
separately, followed by review of literature relevant to each, and 
a discussion of the realities associated with each particular 
myth. 

Myth #1: TW is Not Substantially Related 
-* to General Intelligence 

This myth is based largely upon the assumption tha^JW 
constitutes essentially an unfair advantage on test-talcing tasks 
which some students have happened to acquire arbitrarily, while 
others have not. In addi tion, TW* loses mufch credibility as a 
construct if it can be shown to be highly related to intelligence, 
and therefore not a specific, independent factor. Finally, if TW 
is not strongly related to intelligence, then it appears^more 
likely that it can be easily trained; consequently, groups who can 
be shown to suffer with respect to TW would hypothetical ly benefit 
greatly from short instructional lessons in TW. 

* 

Millman et al. (1965) suggested that a. test-wise subject 
would perform better on tests'than would a less test-wise subject 

■ N 
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of equal intellectual ability. Wahlstrom and Boersma (1968) 
maintained, "while 'good 1 items may be used to control for error 
Variance associated with test-wi seness, the writers contend that 
teacher-made achievement tests contain items with faults, and that 
test-wise subjects often received higher scores than subjects of 
equal intellectual ability" (p. 419). 

The basis for this jjarticuiar myth is found in a small number 
of empirical studies, whose interpretations have been greatly 
distorted. These .investigations will be discussed in turn. 

Dunn and Goldstein (1959) correlated scores on a group 
administered intelligence test (Army Aptitude Area 1) with scores 
on blocks of multiple choice items containing specific item flaws. 
Thesre authors argued that since moderate correlations (.52-. 72) 
were found between IQ and item blocks containing different TW cues 
as well as items containing no TW cues, "the ability to pick up 
cues on the type of material tested may be found at all levels of 
inteJ^^ence" (p. 178). In this investigation, however, no direct 
assessment of the relation between IQ and TW was made. 

Kreit (1968) hypothesized that the intelligence of subjects 
is related to the acquisition of -test-takingskills, and that more 
i ntel ligent.chi ldren would improve more from test Session to test 
session. This hypothesis was not supported. Kreit reported only 
nonsignificant trends in the hypothesized direction. In this 
investigation, however, narrow and overlapping groups comprising 

\ \ 
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his sample precluded a fair assessment of his hypothesis. This 
author, then, did n<?t demonstrate the lack of a strong relation, 
but merely failed to .support his own predictions with respect to 
one aspect of the TW/intelligence issue. 

The most commonly ci\ed study with respect to test-wi seness 
and intelligence was conducted by Diamond and Evans (1972). These 
researchers concluded that TW is cue-specific (that is, not one 
general ability) and that the overall correlation between the 
aspects of TW tested was not strong. In fact, the overall 
correlation between /q and TW reported by Diamond and Evans was 
.49 which, if corrected for attenuation of the somewhe^t unreliable 
test-wi seness test, becomes a correlation of .61. In either case, 
the obtained correlation is strong enough to constitute a moderate 
relation between test-wi seness as measured and general ability. 
The conclusions of Diamond and Evans, although unwarranted, have 
been consistently cited by others more interested in perpetuating 
the myth of this aspect of TW than accurately reporting the data. 

Other researchers, not as widely cited, 'have provided 
stronger information that TW and intelligence are in fact related. 
Anderson (1973) reports,, "analysis of the correlational data 
indicates that for the total sample a significant (though 
moderate) correlation is obtained between TW and mental ability, 
between TW and achievement, and betwen TW and deductive reasoning 
ability" (page 89). Mi Hi kin (1975) correlated performance on a 
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/ 

test-wiseness test and a general mental ability test on a sample 
of 306 eleventh grade subjects, and found a significant relation 
between a measure of gerferal ability and TW. 

Taken as a whole, the bulk .of the research literature seems 
to indicate that a substantive correlation is typically found' 
between TW and tests of mental ability, allowing for a tangible 
amount of shared variance. Apparently, however, these findings 
have not satisfied jjther authors in the* field of TW, for the above 
articles are generally selectively cited as providing evidence 
that TW and intelligence are not .correlated significantly. Thus, 
Dillard, Warrior-Benjamin, and Perrin (1977) maintained, "Kreit 
(1967) found that improved test-wiseness and intelligence were not 
significantly related" (p. 1135). Likewise, Crehan, Gross, , and 
Koehler (1978) cited Diamond and Evans and reported, "previous 
research has shown that TW is not highly related to cognitive 
ability" (p. 40). Crehan, Koehler, and Slakter (1974) also cited 
Diamond and Evans and reported, " investigators examining the 
cognitive correlates of^-TW-^r^e concluded that TW is not highly 
related to cognitive ability" (p. 209)- This myth has also been 
maintained by those who simply assert that students equal in 
intelligence may differ in TW. For instance, Gross (JL977) 
asserted "(TW) concerns the extent to which examinees of similar 
ability or achievement received different test scores as a result 
of differences in test-taking shrewdness" (p. 97). Wahlstrom and 
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Boersma (1968) asserted "... test-wise Ss often receive higher 
scores than Ss of equal intellectual ability' 1 (p. 419). 

It can, therefore, be seen that in spite of substantial 
evidence lfnking general reasoning ability and measures of test- 
wiseness, researchers have continued to report the lack of a 
relation between the two variables. The reasoning for this is 
uncertain, although it no doubt reflects in part an interest in 
(a) defending the construct of TW as one separate from 
intelligence, and (b) consequently, implying that such ability is 
easily trained and manipulated. , To this end, relevant data have 
been misinterpreted, or simply ignored. In addition to the 
empirical findings of correlations between TW and intelligence, 
and the methodological errors of those who maintain there is no 
such relation, an appeal to "common sense 1 ' can be made. High on 
the list of Millman, Bishop, and Ebel's analysis of test-wi seness 
is what /is referred* to as "deductive reasoning strategies", of 
which are included elimination of options known to be incorrect, 
elimination of options which imply the correctness (or 
incorrectness) of each other, utilization of relevant content 
information in other test items, and choice of items which 
encompass all of two or more given statements known to be correct. 
Other strategies include a deduction of the intent of the test * 
constructor and a determination of regularities in stem or option 
cues on the part of the test constructor. It would defy , 
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credibility to assert that these "deductive reasoning" strategies 
are not delated to generaVmental ability. 

As with .most myths, however, elements of truth remain. If it 
is obvious that many test-taking strategies are strongly dependent 
upon the reasoning skills of the test-taker, it is also obvious 
that some other strategies can be easily taught- "and involve little 
reasoning ability. These include such strategies as working 
quickly, moving past items which resist a^quick response, 
answering all questions, using time remaining after the completion, 
of tests to. reconsider answers, asking the examiner for 
clarification of ambiguous questions, guessing whenever itecessary, 
and developing prior familiarity with specific te§£ format 
demands. These strategies also comprise a component of test- 
wiseness and have been successfully trained to mildly handicapped 

students at the primary-age level, to the ftftent ? £ha't performance 

J > 

on Achievement tests has been enhanced (Scruggs, 1984a, b; Scruggs 
& Mpstropieri, in press b). Although such strategies as those 
previously mentioned do not typically appear on tests of jftest- 
wiseness," these strategies jpay be, in fact, somewhat independent 
of intelligence anei therefore subject to relatively simple 
remediation. To this extent, then, the issue of test-wiseness not 
being related to intelligence does have .some support. To the 
extent to which this myth has been reported in the literature,' 
however, it must be challenged — that is, TW"is got a construct 



ERiC 235 



Myths and Realities 

which students happen to acquire by chance or serendipity, which 
is unrelated to intelligence, and which results in substantial 

fluctuations 'of scores in achievement tests. / 

fa 

Myth #2: TW Constitutes a Large Source, of 
Variance/TW Cues are Commonly ^bund on Tests 
^AVt^/u gh i t is clear that some students are less able to 
"outguess" certain test items than tlj&ir nest-wise" peers, the 
issue at sFak^i n this particular myth revolves around whether or 
not the amount of variance associated with TW is large. Some 
authors have simply reported that TW is a potential soured of 
error. Gross (1977) arguas, "Millman, Bishop, and Ebel (1965) 

have advocate* that TW be taught to minimize inter-examinee TW 

> x 
differences, thereby reducing measurement error. .. (p. 97). 

Gross (1977) , referring* to Ebel ' (1965) writes, "more error in 

measurement is likely to originate from students who have too 

little, rathe^ than too much,- skill in taking tests" (p. 97). 

Sarnacki (1979) writes, " TW is widely recognized as a source of 

additional variance in test scores and is a possible depressor of 

test validity" tp. 253). 'Some authors, .however, have magnified 

the importance of this argument and have written that, in fact, 

the source of error in test-wiseqess is extensivW. Thus, 

Wahlstrom and Boersma (1968) maintained, "an important source^of 

variation in test scores is test-wi seness" (p. 413). McPhail 

(1978) argued, "test-wiseness operates' as error variance and its 

4 

*t / 
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effect is to reduce the validity and reliability of tests" 



(p. 168). Kalechstein, Kalechstein, and Doctor (1981) maintained, 

, ..... ( 

"test-wi seness has been considered a potentially large' source of 
error variance" (p. 198). 

The fact that TW accounts for a Source of error variance is 
indisputable. The question here is whether, in fact, TW 

» 

constitutes a lafge source of variance^and whether TW cues are v 
* * *» 

commonly fou'nd in tests.. The basis for the/magnitude of the 
effect of TW derives .largely from a confusion between the ter^hs 
"statistically significant." and "practi c a lift important. For 
example, Sarnacki (1979) cites a number of studies for which 
statistically significant increases in test scores were associated 
with training. in TW (e.g., Ca41enbach, ^973; Gross, ^76; Oakland, 
1972). Altho&gh Sarnacki is correct th^1\ these researchers did, 
ni^fact, exert a "significant" increase \in\test scores as a result 



of training in TW, the fact is that in virtually all cases, the 
effect sizes were quite small (this tissue will be discussed 
further under the "easily trained" myth). In fact, the very'^ 
Studies that Sarnacki cites are stronger arguments in favor of the 
, issue that TW is a relatively* small! source of variance in 
achievement test scores. One specific study'is worthy of mention. 
Sarnacki cites Gross (1976) as evTdence that significant increases 
i^test scores were associated with training in TW. A review of 
this dissertation, however, demonstrates that three selected TW 
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behaviors were taught. These behaviors included risk taking, 
^ . deductive reasoning, and time using. The dependent measure was 

^ the Metropolitan Achievement Test (MAT) Advanced Battery. Gross 

concluded that (a) deductive reasoning was not successfully taught 
(ffee "TW not correlated with" IQ^SwtWv, (b) risk taking (i.e., 
guessing) exerted a si gnif i cant" i nf luence on test score only when 
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guessing was inhibited in control conditions, and (c) although 
time using was successfully taught, it did not affect test score. 
Thus, the very di ssertati on cited by Sarnacki su§Tgests^hat TW 
constitutes a relatively small source of variance. 

In one' of the mos£ thoughtful investigations^ TW, Rowley , 
(1974) administered vocabulary and mathematics test items in both 
free response and multiple choice formats. Partial correlations 
were computed between ^cores on multiple choice items and measures 
of TW and ri ak-taki ng % $8T) , with free response scores partialed 
out. Rowley found significant ^prarti al correlations between 
vocabulary scores and TW and RT measures, and concluded that use 
of multiple choice , tests, "can result in high risk-taking, test- 
wise examinees "scoring qjore highly than cither examinees whose ^ 
knowledge and ability "are the equal of. theirs" (p. 21). Analyses 
of the actual'extent of performance advantage of students high in 
TW is difficult, because gain scores Cfrom free response to 
multiple choice) were not report^. Examination of correlational .< 
data, however, indicates that TW and RT were not correlated at all 
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with mathematics mul.tiple'choice items (partial r's =. near 0) and 

/ ' * 

that the partial correlations with vocabulary items were not high 

(r's of .27 and .14 for TW and RT, respectively) when guessing was 

not penalized (see Gross/^1976) . In this investigation, then, TW 

was seen to account for 7% ,of the variance in vocabulary t:est . 

performance, whi le RT accounted for less than 2% of total * 

vocabulary test variance. When this finding is considered with ' 

the near zero correlations 'between TW, RT, an'd mathematics test 

performance, the conclusion that such factors constitute a large 

source of variance is difficult to justify. 

Another argument in favor of the "large source of variance 11 

myth comes from analyses of tests themselves. Metfessel and Sax . 

(1958) lookedfor bias in placement of key to correct answers and 

found that more questions were keyed "true" on true-fj^lse tests 

than "false." They argued* that 42% of the tests that they studied 

were found to have answer placement flaws that may ^conspire with 

response sets to artificially inflate scones. Even if these data 

are true, the point remains that test-takers would need to know 

ahead of time in which direction keyed items were biased in order 

^to make any benefit of these flaws. The strongest argument with 

respect to Metfessel and Sax 1 analysis, however, is that although- 

/ 

they document the possibility of placerjffint flaws wNch may 
artifically inflate scores, they offer nc/c^uanti tati ve data which 
'support that these cues' actually do result in inflated scores. 
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In order to investigate more fully whether TW cues are 
commonly present in achievement tests, the present authors. have 
recently examined five major standardized achievement tests 

./(California Achievement Test, Metropolitan Achievement Test, 
Comprehensive Test of Bas^ic Skills, Iowa test of Basic Skills, ahd 
Stanford Achievement^Tests) for presence of TW cues, including 
specific determiners,' similar options, stem options, or absurd ^ 
options as defined-" by Slakter et al. (1970). We independently 
evaluated , all test items for the presence of these cues and 
afterwards computed a 96% coeff i cient ~of agreement pn -TW fcues. 
Nevertheless, we found that such TW cues exist in less than half 
of 1% of items on 'all these tests, substantially different from 
the "large source of variance" TW cues are supposed to encompass. 

Another argument which can be made is that although such cues 
are not commonly present in standardized tests, they are, present 
to a large extent in teacher-made test^. . To this en<J, seme 

'studies have indicated that training in TW skills does* not * 

critically influence performance on 'standardized achievement tests 

but does influence performance on multiple-choice tests with 

poorly made distractors, which are then argued to be 

representative of teacher-made tests. Thus, Wah.lstrom and Boersma 
f 

(1968) have argued that TW training increases scores on "pqorly 
made 11 tests but does not increase scores on standardized test 
items. Although there may or may not be some truth to this 

T 
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argument, thesis a logical flaw in it. Those who advocate 
training, in TW to improve scores on poorly constructed test items 
are in essence arguing that teachers should-- teach, their students jfr 
how to outguess their poorly constructed testsS Such an argument 
is not logically sensible, and tn addition, suggests ^hat 
outguessing tes^t. items for which the content is not known would 
'result in more, rather than less, measurement error. At any rate, 
the interests of the teacher and students would be better served 
b% putting additional time into training the teacher to construct 
better items, rather than teaching the students to outguess, them 
more effectively. ^ 

Myth #3: Cultural Differences Exist in TW . .': 
It has been assumed as far back as the "codification" of JH, 
in the original article by Millman, Bishop, ancJ Ebel (1965) that 
TW of, the type found on objective tests is culturally determined. 
One of the more widely cited references to this myth ts by Millman 
and Setijadi (1966) who compared the performance of American and 
Indonesian students on open-ende^l and multiple-choice questions. 
The American students enjoyed an advantage on the objective 
questions, even after the Indonesian students were familiarized 
with the mechanics of choosing the correct answer. Furthermore, 
Lo and Slakter (1973) compared Chinese and American students on an 
instrument meant tq measure TW and risk-taking in test 
circumstances. These two articles "have been commonly cited by 
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researchers as evidence that some ethnic/cultural groups in the 
Unite'd States may score lowe^ on achievement tests because of 
^♦cultural" differences in TW, This possibility has led to^mucji 
research on training Ameri can "minorT^y groups on TW skills.X 
Often, however, deficiencies in TW exhibited by mi nori ty^groupY 
have simply been assumed rather than documented* -Slakter, Kohleta, 
and Hampton (1970) maintain "the object! vesVof [a TW] learning - A 
program would be not only to decrease the errata of measurement 
mentioned by Ebel (1965, p. 206) but to decrease the handicap 
und£r which many examinees apparently operate. For example, 

** 

certain subsets of the- population /(black students, rural students^ 
etc.) score lower on achievement^ tests than the population at 
large" (p. 253). The assumption by these authors J^/that much of 
the difference in achievement test scores is due to\cultural 
influences in TW, and riot lower levels of achievement in general. 

Evidence presented to'support the assertion that minority 
group's lack TW, however, is often tenuous. For example, when 
Kalechstein, Kalechstein, and Doctor (1981) cited Ortar (1960)7 
among others, in their statement, "several investigators have 
noted the lack of test-wi seness in culturally different children" 
(p. 198), they implicitly referred to American minorities, Ortar 
actual ly. speaks of the difficulties in using standardized tests 
when faced with a culturally diverse population^ stating that 
under such circumstances, ,the assumption of equality of past 
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experience cannot be made. It is not cl<e£P that thi s statement ts 
accurate when applied to inner city, black, or lower socioeconomic 
status students. 

Most empirical studies attempting to document differences* v 
in TW between ethnic/cultural groups consist of either (a)' the 
administration of a TW instrument to different cultural groups, or 
(b) attempts to evaluate the impact of TW training on the 
subsequent scores on a TW instrument or a real standardized test. 
Despite the concern expressed by many researchers (e.g., Ebel, 
1965; Ortar, 1960) that score differentials may be related to 
bdtween-gro.up deficits^in TW, relatively litt'le research has 



focused on identifying that deficit. For example, Kalechstein et 
al. (1981) cited previous investigators who have described^ the 
lack of TW in culturally di ff erent/di sadvantaged groups, but 
themselves administered a TW training program to a grgup of black, 
disadvantaged second graders without reference to a supposedly 
"advant^ed 11 group. However, it may be that all second graders as 
a grouf^^e relatively inexperienced with tests in general. The 
performance of black seconcFgraders after exposure to a TW 
treatment in the absence of comparison to ^>ther groups, therefore, 
tells us relatij^ly little concerning cultural group differences 
in TW. Thus, Kalechstein et-al. tiave not established that \ 
achievement tests are less valid for the group, they studied. What 
they have done is replicated the study by Callenbach (1970) with a 

; . / 
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different population and Raised questions not cjirectly addressed 

in theiV.own investigation. Likewise, Dreisbach and Keogh (1982) 
' to 

successfully trained TW skills to Mexican-American children and 

commented "test-wisehess may be particularly important when 

testing chi ldren.f rom economically disadvantaged backgrounds 

and/or where the primary language of the home is not standard 

English" (p.. 228). Although language of test -administration and 

language competence of the child were also investigated, the 

primary focus of this investigation was the hypothesis that 

Mexican-American children "Tack ' test-wiseness 1 and thus do poorly 

on tests" (p. 224). Differential effects of training for low SES? 

or. minority populations, however^were not investigated in their 

study* and leave unanswered the issue of whether such training is 

in fact "particularly important^ for low SES or minority 

populations. • 

In 'Contrast, to the questionable support of cultural/minority 

> 

differences in TW, there, is evidence that these groups differ 
little with respect to TW. In a dissertation by Yearby (1975) in 
which SES, race, and sex were controlled, no significant^ * 
differences were observed between, the groups on the test-taking 
skills pretest. Another study which directly addressed the 
question of whether disadvantaged or minority populations lack TW 
was conducted by Diamond, Ayres, Fi^w^n, and Green (1976). 
Although the study was cl earlyijesigned to indicate relative 



244 



as 

Myths and Realities 
22 

deficiencies in TW on the part of black inner-city children, 
support for this hypothesis was not found. .It was found that 
black inner-city children performed significantly above chance on 
a TW instrument, and that scores on the TW instrument did not 
predict grades on the Verbal Achievement subtest of the California 
Achiev^m^t Test. This suggests that it can neither be. assumed 
that disadvantaged or minority groups lack TW, nor that a relation 
between TW and achievement test scores exis in these groups. . In 
<£~>eview by McPhail (1976), it was/concluded that "TW studies 
conducted on black and other minority student populations . . . 
have been inconclusive" (p. 168). Although.it may be argued that 
direct evaluations of* relative, levels of test-wi seness in minority 
and nonminority groups are lafiking, it.must be maintained that at 
present the assertion of American minority groups being lower in . 
test-wiseness/ and this "deficiency being responsible for much of 
the performance differences between groups, is Targely 

r 

unsupported. 

As in most contemporary myths, however, a degree of truth can. 
be discerned. Although studies which compare the effectiveness of 
test-wi seness training between minority and nonmiiiority groups 
have not been found, a recent investigation does offer some 
support for the^ "cultural difference in TW !I issue. Through meta-' 
analysis procedures, Scruggs, Bennion, and White (in press) have 
been able to make quantitative comparisons in the effectiveness of 
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TW training on achievement test scores of minority and nonminority 
groups "which were not directly assessed by individual studies, 
Scruggset al. evaluated 24 empirical studies which investigated 
the effects of TW training on elementary school students, grades 1 
through 6. It was found that with less than 4 hours of treatment, 
neither "low SES" nor "not low SES" subjects benefited appreciably 
(average effect sizes of -.05 and .08). With more than 4 hours of 
treatment, students from low socioeconomic background benefited 
more than twice as much as students who were not from low SES ° 
backgrounds ^(average effect sizes of .44 vs. .20). Si nce low SES 
subjects under these circumstances appeared to benefit more than 
twicVas much as their counterparts from higher SES groups, the 
finding implies that children from low SES backgrounds 
are somewhat deficient with respect to TW. In addi ti on,.- most 
students representing low SES groups in the studies evaluated were 
also members of inner city minority groups. It must be noted, 
however, that the effect size differential for a student receiving 
4 or more, hours of treatment from low SES and not' low SES 
backgrounds was .24 standard deviation units, a relatively small 
difference which in no way could account for the large performance 
differences seen between SES groups on achievement tests., 
Although the Scruggs, et al. (1984) study provides some evidence 
that students from low SES and minority backgrounds may suffer 
somewhat with respect io TW skills, these defi ciencies explain 
little of performance 'differences between the two groups. 
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Myth #4: TW Is Easily Trained and Results 
in Large Gains in Test Performance 
This myth is related to the "large source of variance/ 
commonly found" myth in which statistical, significance has been 
confused with practical importance. For example, Sarnacki (1979) 
referred to Gaines and Jongsma as having concluded "that TW can be 
taught in a relatively short amount of time with significantly 
higher performance on standardized tests resulting." Slakter goes 
on to cite several others who "significantly" raised achievement 
test scores by TW training (e.g., Callenbach, 1973; Gross, 1976; 
Wahl strom & Boersma, 1968). An analysis of a number of 
significant versus nonsignificant differences, however, says 
little about the relative size of the effect of training. In a 
recent meta-analysis, Bangert-Drowns, Kulik, and Kulik (1983) 
indicated that training in TW resulted in average effect sizes on 
achievement test scores of .29. On the primary grade levels, this 
effect size would be equivalent to approximately three months o£ 
academic achievement, not a large difference by educational 
standards. In a more recent meta-analysis, however, using 
somewhat different criteria for evaluating effect sizes, Scruggs, 
Bennion, and White (in press) determined that the average effect 
size in the elementary grades far raising scores on achievement 
tests was .10, less than half of that reported by Bangert-Drowns 
et al., reflecting grade equivalent -increases of questionable 
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significance. It w^s^only after relatively long-term training 
(i.e., longer than four hours) that the resulting effect sizes- 
began to resemble those reported by Bangert-Drowns et al. This . 
finding demonstrated by meta-analysis in the elementary grade 
level has recently been demonstrated to be true -with college-bound 
students on the Scholastic Aptitude Test (De Simonian & v Laird, 
1983). Thus, it appears that the notion that TW is easily trained 
and results in substantially higher test scores is unjustified. 

Another argument that TW 'is easily trained comes from 
researchers who trained selected aspects of TW and measured 
performance on the basis of a TW instrument (e.g., Gibb, 1964; 
Slakter et al., 1970; Moreshultz & Baker, 1966). It was found 
that TW training does substantially fcnd easily increase scores on 
TW instruments, afld these fi ndi ngs have been supported by the 
meta-analysis of Scruggs et al . (in press). Al though, thi s type of 
training does seem to be effective in promoting scores on TW 
tests, the extent to which this training raises scores on actual 
tests remains relatively small. Another argument offered by those 
who maintain TW is "easily trained". is that ? al though TW cues are 
not common on standardized achievement tests, they are common on 
flawed teacher-made tests, and it is on these types of tests that 
TW training is most beneficial. This issue has been addressed 
above. Although it seems absurd for teachers to teach their 
students to "outguess" their own poorly constructed tests, the 
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idea^ of training teachers to construct better test items is often 

dismissed out of hand. Sarnacki^(1979) argues unconvincingly that 

even if teachers are trained in the principles of TW, item faults 

may still occur. One may just as easily assert- that students may 

*« • - 

forget somfc of the TW skills they were taught. In fact, if the 

same amountyof time was spent training teachers to construct 

better test items, it is logical to assume that less, rather than 

more, error would result than if students were trained to guess 

correctly the answers to questions they do not understand. 

In summary, it can be stated that (a) relatively small gains 
in standardized test performance have" been achieved only after 
extensive training, and (t) although effects are greater for 
poorly constructed items, training in this area is more difficult 
to justify. ; , 

In spite of this present, rather pessimistic appraisal of. the 
"easily trained 11 myth, however, a positive hypothesis, which has 
only recently received some research support, 'does' remain. 
Although grqup differences with respect to TW training have been 
relatively small, it is possible that there exist certain 
individuals (or small groups) for whom TW is both necessary and • 
beneficial* and for whom relatively large' differences in 
performance can be achieved. It has been seen that students 
classified as mildly handicapped (i.e., learning disabled and\ 

\ 

behavioral ly disordered) may differ from their nonhandicapped 

\ 
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peers with respect to (a) attitudes toward tests (Scruggs & 
Mastropieri, in press a), and <b) spontaneous production of - 
effective test-taking strategies, including the effective 
utilization of test format (Scruggs, Bennion, & Lifson, in press 
a), selection of an appropriate test-taking strategy (Scruggs, 
Bennion, & Lifson, in press b) , and use of prior or partial 
knowledge and deductiveVeasoning (Scruggs & lifson, 1985). A 
recent experiment in TW trailing of regular third grade students 
rhas indicated that TW training benef i ted ^the lower half of the 
class much more so than the .upper half (Scruggs, Bennion, & 
Williams, 1984). Such 'differences were seen to "wash out" when 
scores of the 'trained group as a whole were combined. Finally, 
successful training of test-taking skills has recently been 
achieved in special educ^ti on popul ations (Dunn, 1981; Lee & 
Alley, 1981; Scruggs, 1984; Scruggs & Mastropieri, in press b). 

• V 

% 

The obtained effect sizes in these initial investigations have 
tended to be somewhat larger than those obtained on nondisabled 
populations, and there is the. added feature that many of these . 
students are functioning within a level at which relatively slight 
changes for better or worse on . achievement test performance may 
result in more serious decisions regarding educational placement. 
In other words,, .although gains have typically been small and of 
less consequence for normal ly achievi ng students, even relatively 
small gains may be' of greater importance to students functioning 

» 
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at the lower end of the distribution. Also, mildly handicapped 
groups do in fact exhibit less efficient test-taking strategies 
than their nonhandicapped peers, and it would seem logical to 
assert that these students should be trained to utilize the same 
strategies that other students are spontaneously using. 

Summary and Conclusions 
The present view has attempted to critically evaluate four 
contemporary myths, associated with test-wi seness. In this 
article, we have stated that (a) the di sassoci ati on of TW fr<3m 
general cognitive ability has not been verified, (b) TW hSs not 
been shown to constitute a large source of 'error variance in 
tjests, (c) American minority groups have not been shown to be 
seriously lacking in TW, and (d) relatively modest improvement in 
test scores has been achieved only through long and intensive 
training in TW skills. Stated more positively, TW. can be said to 
be a tangible component of the test-taking experience but qrie 
which nevertheless plays a relatively minor rol x e iji overall test 



scores for ^ost students. * 

Several imp! ication^*can be drawn from this analysis for the 
■practicing school psychologists First, in many individual cases, 
it may be wiser to assume TvlHfas played a relatively minor role in 
test performance. Although teachers often explain a particular 



student 1 s- poor test scores by asserting he/she is simply a poor 



test-taker," such reports may reflect either a well-intentioned 
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but misguided sympathy for the student, or simply a misreading of 
the student's actual abilities. A psychologist who has been told 
that a particular child's low scores reflect orily^oor test-taking 
skills would be well advised to seek more tangible evidence that 
tihis is truly the case. '". Second, if it can be demonstrated' that a 
given student is exceptionally weak in TW, there is little reason 
to believe that that student could not be trained in TW skills. 
Finally, in the case of special education students, it may be 
advisable to ensure that all such students have had some N ~ 
•additional guided practice on unfamiliar test formats. 

It can be concluded that although TW as a construct is weaker 
and less pervasive than commonly assumed, there is nevertheless 
tangible evidence of its (perhaps mul tif aceted) existence and some 
indication that, although large groups tend to gain 'little from 
specific training in TW, there may be certain individuals or 
smaller groups for whom the construct of TW does constitute an 
"important source of error." Further research in' this area m$y do 
much to ultimately clarify the issue of test-wiseness. 
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Abstract 

\ 

Research describing academic and intellectual characteristics of 
behaviorally disordered (BD) students is reviewed. Investigations 
reviewed in this paper have focused on areas of intellectual, 
academic, and psycho-social functioning as they pertain to school 
achievement. In general, it has been found that BD students 
exhi birtf* academic deficiencies greater than those exhibited on 
tests of intellectual functioning and perform below average in all 

s 

^ * 

content areas, with particular discrepancies noted in matlr^ 
functioning. In addition, variables such as locus of control, 
responses to the test-taking situation, and attitudes toward^ 
academic tasks, may covary with academic performance. 
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Academic and Intellectual Character, ,ics of, 
Behaviorally Disordered Children and Youth 

All students classified as behaviorally disordered (BD) by 
definition are in need .of programmi ng designed to improve social 
or emotional functioning. Since most of this programming occurs 
in academic environments, however, it is important to know whether 
students so classified also exhibit deficiencies with respect to 
intellectual or academic functioning.. If BD students are 
generally found to be^deficient in academic functioning, it may be 
necessary to incorporate remedial instruction as a major component 
of the educational environment. This review is intended to 
synthesize academic and intellectual characteristics of \ 
behaviorally di sordered children and youth in order to provide a 
basis for future researcfrand practice. 

Two data bases ( Psychological Abstracts , ERIC) were examined 
for data-based articles pertaining to academic and intellectual 
characteristics of BD students. In addition, recent books on 
behavior disorders (e.g., Kauffman, 1985) were reviewed for 
sources. Finally, past issues of the journal B ehaviora l Disorders 
and the series Monographs in Severe Behavi'or Disorders of Children 
and Youth were examined for relevant articles. Articles were 
included which selected a population on the basis of disturbances 
in fecial or emotional functioning, exclusive of psychotic or 
autistic samples. By these means,' 25 articles reporting data were 
located and are given in Table 1. " 
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V , 

Insert Table 1 about here 

i — 

The investigations reviewed here represent a wide range of 
samples of children and youths referred to as "behavi oral ly 
disordered." To this extent, any general agreement between 
investigations suggests broad general izabi 1 i ty. When research 
reports disagree, however, interpretations are more difficult. 
In general, descriptions of academic and intellectual 
characteristics can be divided into three main areas: (a) 
intelligence, (b) achievement, and (c) psycho-social functioning 
and academic performance. 

Intel ligence 

- - Studies of intellectual* -functioning are of relevance to the 
study of academic characteristics for two reasons: (a) IQ 
consistently has been a strong predictor of academic achievement 
(Kauffman, 1985), and (b) IQ scores can provide information 
concerning ab:i Tity/achievement discrepancies. The following 

4 

section describes the results of several investigations of 
intellectual performance. * 

In 1964, Stone and Rowley reported a mean IQ of 96.5 (ranging 
from 62 to 135) for 116 children referred for psychiatric 
services. Graubard (1964) found 21 delinquent or neglected boys 
in psychiatric residential treatment for two to eight years to 
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have a mean IQ of 92.3 (range 71 to 108), Schroeder (1965) 
reported that for 106 students classified as psychosomatic, 
aggressive, exhibiting schoqj/oif f icul ti es , school phobic, or 
neurotic, the average ^I^was 95.95. . Motto and lathan (1966) 
studied 47 school-age children in a state hospital and reported 
that, as a group, they were in the dull normal range of general 
intelligence. Glavin, Quay, and Werry (1971) reported IQ ranges 
of 89 to 112 for IT conduct problem children placed in special 
classrooms. Ftiller and Goh (1981) examined 38 learning disabled 
and 42 emotionally disturbed public schoVl^ chi Idren and reported 
lower average IQ scores for the ID than for the ED students (86.13 
and 89.50, respectively). As recently as 1983," Forness , Bennett, 
and Tose reported that 92 subjects (23 girls and 69 boys) who had 
been inpatients at a neuropsychiatry institute had, on the 
average, IQ scores in the low 90*s. 

Reilly, Ross, & Bullock (1979) examined the intellectual 
performance of 177 adjudicated adolescents and reported ,a mean IQ 
score of 90.26, a figure consistent with that of a previous 
investigation (Bullock & Reilly, 1979). In addition, these 
researchers reported that subjects s/ored hear average on the 
Picture Arrangement subtest of the Wechsler Intelligence Scale for 
* Children - Revised (WiSC-R) which requires visual sequencing of 
simple stories, but lowest on those verbal subtests which require 
knowledge of the "outside world 1 ':' Information, Similarities, 
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Vocabulary. Finally, a relation between IQ performance and 
violent behavior was not found in this investigation. 

Research on intellectual performance of di sturbed^ chi ldren 
reveals that the majority of mildly and moderately disturbed 
children fall only slightly below average in IQ. These 
investigations., taken together, appear to suggest that mild > 
academic deficiencies could be predicted on the basis of observed 
intellectual functioning. Scruggs and Mastropieri (1984) pointed 
out that IQ scores in combination with achievement test scores can 
provide information regarding relative discrepancies between 
ability and academic performance of the behavi or al ly di sordered 
population. What IQ scores cannot, bo is describe behaviorally 
disordered students 1 actual levels of academic performance. 
Kauffman (1985), however, does maintain that IQs of disturbed 
chi.ldren are the best predictors of future educational 
achievement. The following section describes investigations of 
academic functioning. 

c 

Achievement 

Reading and Arithmetic 

Silberberg and Silberberg (1971) reviewed research on school 
achievement and delinquency. They cited early studies by Lane and 
Witty (1934), Bond and Fendrick (1936), Sullivan (1927), and Hill , 
i "(1935) who found that, in general, delinquents were deficient in 
reading achievement. 
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Tamkin (1960), whose subjects included 34 children receiving 
residential treatment for emotional disorders, reported both the x 
arithmetic and reading grade rating r to be within the range / 

commensurate with the mean chronological age of the sample. 

>■ 

Arithmetic achievement was significantly lower than reading. Data 
from the Wide Range Achievement Test (WRAT) showed that 32% 
demonstrated some degree of educational disability, 41% were/ 
educationally advanced, and the remaining 27% were at expected 
"N grade level . 

V 

Stone and Rowley (1964) tested 116 children referred for 
psychiatric services using the WRAT. The majority of children 
feTl belqty the expected level of achievement in reading and " . 
arithmetic on the basis of both chronological and mental ages* 

These chi ldren-al so scored significantly lower in arithmetic than 

• 

reading. In actual grade placement, a larger proportion were in 
grades below those expected on the basis of^ ^ronol ogi cal age. 
Likewise, Reilly, Ross, and Bullock (1979) reported that academic 
performance was deficient in all areas* wi th ^arithmetic scores 
consistently lower than. reading. . In addition, Reilly et al . 
(1979) reported that violent offenders had the lowest reading 
scares. . In a related investigation, Bullock and Reilly (1979) 
reported lower achievement^ in all content areas, on a similar 
sample of youthful offenders. Additionally, greatest achievement 
deficiencies were found for male, minority, and older subjects. % 

i 



er|c \ 269 



/^Academics 
8 

Graubard (1964) compared the performance of 21 children in a 
psychiatric residential treatment center. Using the\^tropol1 tan 
Achievement Test and the Stanford Achievement Test, he reported 
severe reading and arithmetic disability by comparing mental age 
to expected reading and arithmetic achievement. No evidence 
supporting a significant difference between reading and arithmetic 
achievement was found. 

Schroeder U965) compared the WRAT scores of 106 students 
classified as having embtiorral problems (psychosomatic, 
aggressive, school difficulties, school phobia, or neurotic 
personalities). The mean scores were consistently lower in 
arithmetic than reading in all five categories. The school 
difficulties category included the lowest mean achievement level 
in arithmetic and reading.* The highest grade equivalent composite 
mean was reported in the neurotic-psychotic category. Emotionally 
di sturbed. chf ldren were deficient at all age levels with respect 
to school achievement. Schroeder^concluded that academic 
disabilities are concomi tant wi th emotional disturbance and vice 
versa. ~\ . fc 

Glavin *and Annesley (1966) administered the California 
Achievement Test to 90 normal boys and 130 behaviorally disturbed 
boys (who were further, divided into conduct problem, withdrawn, 
and inadequacy-immatiiri ty groups) in public school. Their 
findings showed 81.5% of the BD group were ^underachi evi ng in 
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reading and 72.3% underachieving in arithmetic. Academic failure 
can be expected in a high proportion of delinquent or conduct 
disordered children according to the review of Silberb6rg and 
Silberberg (1971); Glavin and Annesley (1966) found no significant 
differences in performance between the conduct disordered and the 

withdrawn group. 

Motto and Lathan (1966) found no significant difference in 
the uniformity of achievement in reading and arithmetic of 47 
school -age children from a state hospital. The children were 
below expectations based upon chronological and mental ages. v 
However, they did find more pronounced retardation in males. 

Forness, Bennett, and T-ose (1983) found similar results 
comparing 92 children who had been inpatients at a 
neuropsychi atric institute. Both boys and girls scored below 
expected levels on the P$abody Individual Achievement Test, 
although 12 year old boys were lowest in reading recognition and 
reading comprehension. In a similar investigation (Forness, 
Frankel, Caldon, & Carter, 1979), 34 h^italized patients^ 
exhibited deficiencies in all academic areas, particularly math 
and spelling. 

Fuller and Goh (1981) compared 38 learning disabled and 42 
emotionally disturbed public school children. The Wide Range 
Achievement Test scores of LD children were lower ^th an those of 
BD children on reading, spelling, and math. This was not so, 



271 



.Academics 
10 

however, on the -Minnesota Percepto-Diagnostic Test, although no 
statistical tests were computed on the results. 

Harris and King (1982) compared academic achievement of 
children classified as having learning problems, behavior 
problems, learning and behavior problems, or "no problem^." They 

studied scores of 242 public school children administered the 

Science Research Associates (SRA) Achievement Tests. Those 

children with learning problems scored lower than the children 

with no problems. Those with behavior problems did not differ 

from the no problem category oh the SRA subtests of Reading, Math, 

Science, Use of Sources, but did differ from all groups on 

Language Arts and Social Studies. The learning and behavior. 

problem group performed lower than all groups ,on the SRA. 

Epstein and Cul ljj*afT(1983) also found that for 16 matched 

pairs (IQ, sex, chronological age, ethnicity) of learning disabled 

and behaviorally disordered public school students, the BD 

students scored significantly higher than the LD students on all 

subjects except the general information subtest of the Peabody 
f 

Individual Achievement Test (cf., Reilly, Ross, & Bullock, 1979) 
fend the math subtest of the Wide Range Achievement Test. These 
researchers suggested that differential academic programming may 
be indicated for LD and BD children. 

In contrast, Scruggs and Mastropieri (1984) investigated the 
Stanford Achievement Test scores of 1480 primary grade special 
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education students (619 learning disabled and 863 behavi oral ly 
disordered) in several different content areas. They concluded 
that the LD and BD children were, in fact, very similar with 
respect _ to academic performance, with LD children scoring slightly 
but consistently higher than BD children. No consistent reading- 
math discrepancy was noted in either population. Also found was 
the fact that the variability of BD student' performance / 
descriptively exceeded that of LD students; thus, a wider rar^ge of 
academic achievement among BD students may be expected. / 

In contrast to the above studies, one investigation Reported 
results which suggested that*BD students do not exhibit Academic 
deficiencies.. Graubard (1971) examined the reading achievement 
and behavior checklist scores of 108 emotionally disturbed 
children and conclude^T^k^ a ^ groups 1 reading commensurate" with 
MA and several groups 1 reading commensurate with CA JI (p. 757). 

Graubard added, however, that academic retardation in his sample 

i 

was associated with severity of conduct disorders. Unfortunately, 
no data were offered tc support these conclusions. 
Spell ing 

Few studies in subjects other than reading and arithmetic 
have been conducted. Glavin and DeGirolamo (1966) found 
differences between withdrawn and conduct disordered students with 
respect to types of spelling errors. The withdrawn children made 
significantly more written spelling errors, while the conduct 
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problem children made si gnifi cant lxmore refusals (i.e., refused 
to complete the task). They concluded thatxchi ldren with 
emotional problems may show patterns of spelling errors which 
differ both quantitatively and qualitatively from those of normal 
children. In addition, as mentioned above, Fuller and Goh (1981) 
found that learning disabled students scored lower than 
emotionally disturbed students on tests of spelling achievement. 

Psycho-Social Functioning and Academic Performance 

i 

The present review of previous investigations can offer 
little .evidence that the reported academic def iciencies of BD 
children are content specific; that is, research findings tend to 
support the notion that BD students are deficient in all areas of 
academic functioning, with some individual investigations 
reporting more serious deficits in math. Research which has 
examined academic performance in several different areas within 
one investigation has supported this conclusion (e.g., Scruggs & 
Mastropieri, 1984). However, several other researchers have 
investigated the interaction of academic performance and measures 
of psycho-social functioning. Onfe major purpose of these 
investigations, described, below, is to identify possible causal 
explanations for academic deficits. 

Glueck and Glueck (1950) reported that delinquents exhibited 
more dislike for school subjects requiring strict iogical 
reasoning and persistency of effort as well as those dependent 

/ ■ 
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upon efficient memory skills. This finding may partially explain 
some of the previous reports o'f differentially low performance in 
math. School achievement of the delinquent students was far below 
that of nondel inquerits. 

'Graubard (1965) found that 35 delinquents incarcerated at a 
residential treatment center had similar communication patterns to 
those of non-adjudicatod adolescents. The author maintained, 
however, that deficits were exhibited in the visual-motor channel 
(integration level). Delinquents also were reported to exhibit 
deficits in the Auditory Vocal Automatic modality and in 
directionality. Findings reported in this investigation, however, 
may be complicated by reliability and validity limitations of the 
measures administered (i.e., Illinois Test of Psychol inguistic 
Ability, Harris Test of Lateral Dominance). 

Two investigations examined locus of control and academic 
achievement with BD students. Hisama (1976)' compared 48 special 
education students with learning and behavior problems to 48 
nonhandi capped students on a locus of control measure. It. was 
nypotnesized that externality may be a factor for low achievement 
motivation of behavioral ly disordered and learning disabled 
children. Hisama report:: that the Children's Locus of Control 
Scale showecl no difference in scores between normals and LD and BD 
students. It was concluded that the child with learning and 
behavior problems may not be more externally oriented than the 
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normal child. In a similar study, Perna, Dunla^ and Dill ard 
(1984) found that for 63 males classified as mildly to moderately 
emotionally disturbed, those students who felt a high degree of 
self-responsibility for their successes and failures (internali ty) 
showed greater academic gains. 

Letteri (1979) provided a "Cognitive Profile" associated with 
low academic achievement and severe behavior problems as a result 
of research efforts with 200 subjects (some BD, some not). The 
cognitive processes associated with low achievement were said to 
include: Simple (vs. cognitive complexity), leveler (vs. 
sharpener), intO;-?rant for ambiguous information, global or field 
dependent (vs. analytical way of perceiving), broad (vs. narrow 
inconcl usi veness in breadth of categorization), non-focuser, and 
impulsive (vs. reflective). 

Four recent studies investigated attitudes and responses to 
achievement tests themselves. Scruggs, Mastropieri, Tolfa, and 
Jenkins (1985) examined attitudes expressed by BD students toward 
J :he test-taking experience. When surveys were administered 
at the beginning of the school year, reported attitudes of BD and 
more average students were very similar. When administered 
immediately after three days of testing, however, BD students 
reported more negative attitudes than their regular class 
counterparts. Taking a different perspective, Forness and Dvorak 
(1982) examined the general question of academic performance of 
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disturoed or behaviorally disordered students under different 

testing conditions. Forty adolescents who had been inpatients at 

a neuropsychi atri c institute were tested using the Comprehensive _y 

Test of Basic Skills under untimed conditions. Their scores were 

compared with scores obtained at the end of the normal time limits 

of the test. The only performance to increase under untimed 

conditions was that of reading comprehension. Similarly, Scruggs 

and Mastropieri (in press) trained a sample of mildly handicapped 

students, mostly BD, on test-taking skills and reported a 

significant performance advantage on reading subtests. This 

finding suggests that BD students may be deficient with respect to 

test-taking skills. In a more recent study, Scruggs, Mastropieri, 

and Tolfa (1985) reported that test- taking skills training of BD 

students had differentially raised scores on a "math concepts" 

subtest over those of LD students to the' extent that trained BD 

students gained 16 percentile points over their untrained 

counterparts. This finding may help explain why BD students' 

achievement scores in math are often differentially low. 

x Conclusions ! 
The investigations reviewed in this paper represent a wide 
range of populations, all considered in some way "behaviorally 
disordered." Different assessment measures have been used in a 
wide variety of different settings. In spite of the diversity of 
methods, measures, and population samples, however, some broad 
conclusions can be drawn and are given below. 
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First, BD students consistently have been seen to exhibit 
academic and intellectual deficiencies. Although several 
investigations have examined the possibility of specific content 
area deficiencies,' al 1 evidence to date indicates that academic 
deficiencies exhibited by this population are global, with a 
smaller set of investigators suggesting arithmetic performance may 
be relatively lower than reading.- In addition, deficiencies in 
academic areas have typically been greater than intellectual 
deficiencies. Investigators who axamined ability/performance 
discrepancies in BD children have indicated that academic 
achievement is generally below levels predicted by ability tests. 
These consistent results suggest that the need for academic 
remediation in this population is as ^reat as the neetl for 
behavior management and social skills trajjiing. 

Whether the reported academic deficiencies of BD students are 
greater than those typically exhibited by learnip^ disabled 
students is less certain. Fuller and Goh (1981) and Epstein and 
Cullinan (1983) reported that LD students scored lower on 
achievement measures, while Scruggs and Mastropieri (1984) 
reported that LD students scored consistently higher. In spite of 
these discrepant findings, however, substantial academic 
deficiencies have been reported in botn populations. In addition, 
BD students have exhibited consistently higher variability, due no 
doubt to the fact that LD students are operating under an academic 
"cut off 11 level, while BD students are not. 
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In addition, several variables have been identified which may., 
partially explain observed academic deficiencies. These 
potentially related variables include attitude toward school 
subjects (Silberberg & Silberberg, 1971), external locus of 
control (Hisama,\<)76; Perna, Dunlap, & Dillard, 1984), 
impulsi vi ty (Letteri , 1979), and responses to test-taking 
situations /f? orness & Dvorak ,. 198£ ; Scruggs & Mastropieri, in 
press; Scruggs, Mastropieri, & Tolfa, 1985; Scruggs, Mastropieri, 
Tolfa, & Jenkins, 1985). Many of these investigations simply 
describe characteri stics^of this population, however, and do not 

provide information that these variables are, in fact r ca.u Lsally . 

related. Further research is needed to document more carefully 
the reasons for the observed academic deficiencies. 

Finally, it must be noted that-r£search concerned with } 
optimal instructional strategies for this population has been 
greatly neglected, given the nature- and extent of the problem. 
Epstein, Cullinan, and Rose (1980) referred to academic 
remediation of BD students as an area "... of great concern to 
special education practitioners, but, ironically, of less concern 
to researchers 11 (p. 54). They described the several * 
investigations which had been conducted, virtually all of which 
examined the role of token reinforcement in increasing academic 
performance. Although some ictftial research has been conducted 
which appears promising in evaluating the effect of such other 
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instructional variables as corrective feedback (e.g., Polsgrove, 
Reith, Friend, & Cohen, 1979), increased instructional time (e.g., 
Reith, Polsgrove, Semmel , & Cohen, 1979), self-management (e.cf. , 
Cohen, Polsgrove, & Reith, 1979), peer tutoring (Scruggs, 
Mastropieri, & Richter, in press), and cooperative vs. 
competitive learning (Scruggs & Mastropieri, 1985), further 
research is needed to refine these variables and to identify other 
variables effective in remediating the serious academic deficits 
of this population. 
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Table 1 



BP Academic Chardc^ri sties Studies 



.AUTHORS 



Bullock & Reilly / 188 adolescents adjudi- 



(1979) 



Epstein, & 
Cul lfnan 
(1983) 



Forness * Bennett , 
& Tose (1983) 



SUBJECrS 



cated for behavioral 
offefises.* 



16 matched pairs (IQ, > 
sex, CA, ethnicity); 
LD &,BD; public school 
students 



-23-ftlrls, and 69 boy$\ 
who had been inpatients^ 
at a neuropsychiatry 
institute; mean age 
10,1 years 



'X 



Forness & Dvorak 
(1982 



Forness , ^Frankel , 
Caldron & Carter 
(1979) 



Fuller & Goh 
(198^) 



40 BD adolescents (15 
inales, 25 females) who 
- had been inpatients at 
a neuropsychiatry 
institute; mean age 
15.7/ years 



,34 children (CA 7.0 to 
12. 9). 'hospitalized for 

ievere behavior diso- 
rders ^ 



38 LD and 4^ ED 
Children; public school 
setting; mean age 10 
years. 



TASK 



Wechsler Intelligence Scale., 
Wide^Range Achievement Test 
(WRyVT) . 



Peabody Individual 
Achievement Test (PIAT) "and 
Wide Range Achievement Tes^t 
(WRAT) were administered to 
both groups. 

; = ; 



Peabody Individual ' 

Achievement Test>(PIAT) and 
.^Wecfisler Intelligence Scale 
"for CM ldren-Revi sed 

(WISC-R) were administered 

tergal 1 students. 



r 



Comprehensive Test of Basic 
Skills (CTBS) # was admin- - 
istered' and scored under 
times and untimed testing 
conditions. 



Peabody Individual Achieve- 
ment Jest (PIAT).^^ 



Wechs'ler Intelligence Scale 
for Children-Revised), Wfde 
Range Achievement, Test 
(WRAT), and Minnesota - 
Percepto-Diagnostic Test 
(MPD) were administered to 
al 1 students. 



RESULTS 



1.' Average IQ of 90. 

"2. Average achievement deficit in 

all areas. 
3. Discrepaqcies were greatest for 
: males, mi atari ties, older students 



BD'students scored significantly 
highej; than LD students on all 
subjects except general infor- 
mation subtest of PIAT and math 
subtest of WRAT. 



Both girls and bo^s scored below 
expected levels on PIAT 
(moderately). , 
Both girls & boys IQ in low 90's 
12 yr. old^boys worse in reading 
recognition and reading 
contprehensi or*. 

10 yr. old girls 2.1 yrs. below 
grade level . 

12 yr. old gifels 1.7 yrs'. below 
grade level . 



No significant test score 
differences, except on -the, 
reading comprehension subtest, 



1. Students were deficient in all 



J 



academic areas, particularly 
math and spelling. 
Longer hospi tal fzati on periods 
Were associ ated wi th greater 
academic gains.. 



Discriminant analysis procedures 
indicated th.at LDvStudents & ED- 
students cou-ld be accurately 
placed. 

LD's lower than ED *on IQ,, 
reading, spelling, and Vina th, but 
not on MPD (however, no 
statistical; tests computed on 
results). 
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(table continues) 



AUTHORS 



Glavin & 
Annesley 
(1966) 



Glavin & 
Degirol amo 
(1966) 



SUBJECTS 



"130 BD boys and 90 
normal boys in public 
school settings. (BD 

♦further divided into 
conduct problem, wfth- 
drawn, & inadequacy- 
immaturity, groups) . 



1\ 



Glavin, Quay M & 
Werry (1971) 



Graubard (1971) 



Graubard (1965) 

I 



s 



9 ED and 9 Regular 
Education students; 
public school 
setting. 

15 ED 'students - 
classified as 
/either conduct d.is 
ordered or 1 with- 
drawn, and reg. ED 
students. 



TASK 



Caljfornia Achievement Test 
(CAT) and Behavioral^ Scales 
(Quay & Peterson 67)1 



Spelling words fvom GATE'S 
A List„of*Spelling 
Difficulties in 3876 worlds 
(1937) were admini sterecf" to 
both* groups . 



RESULTS 



2. 



81 .5X t of the BD group were 
underachieving in reading. \ 
72.3% of 'the BD group were un- 
derachieving in arithmetic. 
No significant differences in 
performance were found between 
the conduct disordered group & 
the withdrawn group. 



Conduct problem chil- 
dren plac&d in experi- 
mental special class- 
rooms-; SOX Afro-Ameri- 
can; *lQs 89-112; 1967, 
N=ll ,1mean age 108 
months (age range 91- 
132); 1968, N=12, mean 
age 112 months (age 
range 89-131); both 
years, N=8. •> 



10b disturbed students 
in Special school's 4 . 



35 disturbed delinquents 
incarcerated at reside^ 
tial treatment center; 
age range 8 years 6 
months to 10 years 11 
months. 



1967, Wide Range Achievement 
Test (WRAT)'; 1968, Calif or 
nia/.chievement Test (CAT) 
pre- and past. 



Reading Achievement, Beha- 
vior problem Checklist 



Wechsler Intelligence Scale 
for Children (WISC)-, Metropo- 
litan 4chievanent Test (MAT) , 
Illinois Test of Fsycholini 
guistic Abilities (-ITPA), 
Monroe Test of Auditory Blen- 
ding (MTAB), and Harris Test 
of Lateral Dominance (HTLD). 



3. 



ED students made more "internal" 
errors and fewer "external" * 
errors than regular students. 
Withdrawn^ students wrote signi- 
ficantly more unretognizable 
words. 

Conduct disordered students jnade 
significantly more -"refusal* 
errors . 



1. 1968 arithmetic gain 1.7 years. 

2. - 1967 arithmetic gain .1 years. 

3. 1968 reading gain 1.2 years. 
4'. 1967 reading gain .5 years. 

5. 1968 greater emphasis on aca-. 
demic achievement. 

6. .Gain indicates program brings 

changes in specific learning- 
related behavior and obtains 
concomitant gains in academic 
achiev£meni. » 



-h 



No overall reading deficiency .- 
Observed deficiencies associated 
,with severity.of conduct dis- 
order. „ < t 



, BD students' did not differ from 
ndrmals in communication pattern. 

*BD students have defici'ts in the 
visual -motor channel (the ' 
•integration level). J 
BD students have deficitSMn the 
Auditory Vocl'l Automatic 
modajity and in directionality. 
^ ! — : 
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AUTHORS 



Graubard U964) 



Harris & King 
(1982) 



Hisama (1976) 



Letter) (1979) 




SUBJECTS 



21 children In psychia- 
tric residential treat- 
ment from 7-8 years (de- 
linquent or 'neglected) ; 
mean age 13 years 10 
months (range 10-16); 
mean grade 7.9 grange 5- 
11; mean IQ 92.3 (range • 
71-108);. all boys. 



242 children in grades 4 
and 5 in public school 
settings; students were 
classified' as LP 9 v 
(learning problem ff?33)„ 
BP (behavior problem 
N=17), LBP (learning^ 
behavior problem N=19) 
or NP (no problem N=173) 



f 



48 special ed. chi ldren 
with learning and beha- 
vior problems; mean CA , 
108 months (ranges 96- 
132) ; publ ic schools; 
3rd or 4th grader^. 
48 normal 3rd or 3th 
graders; v free from lear- 
ning and behavior pro- 
blems randomly selected; 
mean CA 106) vnonths (r'an- 
ges 90-136). / 



200 subjects (some BO 
some not). 



TASK 



Wechsler Intelligence Scale 
for Children (WISC), Metro- * 
poli tan Achievement Test, ' 
Stafford Achievement Test^ 



Science 'Research Associates 
Achievement Tests ISRAI, 
Cffi ldren *s Personality 
Questionnaire (f PQ) , L-J 
Sociometric Test (L-JST). 



Children's Locus of Control 
Scale (CLCS), Coding Test and 
Digit Symbol Test from WISC, 
Wechsler Adult Intelligence 
Scale (WAIS) , NIM game (match 
game ) . 



RESUI. 



sr; 



Difference between reading and 
math not significant; mean grade 
rating both test^4.75; mean • 
grade reading comprehension 4.87; 
•mean grade arithmetic computation 
4.62. ~ ;W 
Educational disability measured 
by comparing mental age to ^rea- 
ding and arithmetic ages. Severe 
reading and arithmetic disability 
founjj. 

Not achieving commensurate with. * 
mental ages and disabled in aca- 
demic achievement- 
No evidence supporting signifi- 
cant diff ertenck-between reading 
and- arithmetic achievement in' po- 
pulation with severe emotional 
problems over time. 



L. LP students achieved lower scores 
/ on SRA, were less preferred by / 
peers, were less intel 1 igent, than 
NP and l^ss assertive than BP and 
1 LBP groups. 
Z. BP did not "differ from NP^on SRA 
subtests: Reading,' Math, Scierffce, 
Usfe of Sources, but did differ 
from all groups on Language. Arts 
and Soci al Studies* * " ' 
3^ P D did not differ from any group 
f . sociGinetrically. ' 

LBP did perform lower than all 
groups op SRA, were preferred 
less by all groups. 



Cogni ti ve Profile. 



i 



'No significant difference in 
CLCS scores/ between normals and 
LD and BO. j BO not externally 
oriented. 

Coding Test showed children with 
internal.ity performed better 
than those with externality/ 
Within experimental group, ex- . 
ternal ly-ori ented child respon- 
ded to success* experience posi- 
tively and performance depressed 
under failure condition. 



Cognitive profile associated 
with low academic achievement & 

'and severe bejiavior problems is: 
simple, leyeler, intolerant for 
ambiguous information, global, 
broad, non-focuser, and 

.impulsive. v 
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( table continues) 



v. 



AUTHORS 



Motto & Lathan 
(1966) 



5 *> 



Petfna, Dirnlap, 
& Dillard. (1984) 



Reilly, Ross, L 
Bullock (1979) 



Schroder (1965) 



J 1 




SUBJECTS 



School-age population of 
state hospital; 34 boys /t 
mean age 13 years 1 mo. 
(range 10-2 tq 16-9); 13 
girls, mean age 11 years 
2 -mo. (range g-3 to 
15-1); as group, in dull 
normal of general i/itel 
ligence. 

' / 



63 males classified as 
tpildly ti> moderately ED 
in public schools; age 
range 10-15 years (mean 
age 12.9 years). 



177 adolescents adjudi- 
cated for specific be- 
havioral offenses. 



106 students classified' 
1n one of f i ve, catego-. 
ries (psychosomatic, ag- 
gressive, school diffi- 
culties, school jphobia, 
neurotic-psychotic per- 
sonalities); mean age 
147.06 months* 

V 

IS 



TASK 



Wechsler Intelligence Scale 
for Children (WISC); Wechsler 
Adult Intelligence-Scale 
WAIS), Stanford-Binet, Form 
L; Cajifornta Achievement 
Test (CAT), reading and 
arithmetic. 



C 



In tellectual" Achievement 
Responsibility (IAR), Chrono- 
logical age, Stanford-Binet 
IQ (S-BIQ) or WfSC-R, CbH- 
fornia Achievement Test (CAT) 



Wechsler Intel 1 igence'Scale 
for children (WISC-R) t Wide 
Range Achievement Test *(WRAT) 



RESULTS 



sig- 



5, 



Uniformity .of achievement in 
reading and arithmetic - not 
nlficantly different. * 
Femltes, CA 1.4 below ^xpecta- * 
tions In reading; CA.1\6 below 
expectations in arithmetic; MA . 
below expectancy ih Reading, and 
v9 below in arithmetic, v 
Males, '^CA 2.6 below readiritf ex- 
pectancy; CA 3.7 below expectancy 
in arithmetic;" MA l.*8 below rea 
ding, and 1.9 below arithmetic. 
More pronounced retardation iw* 
males. , / 
Children 1-n hospital schoo\.in 
excess of 10 months gal 
reading and # arithmetic 
ment to extent expected fdr fheir 
mental ages. 



cnoo\ in 
acTN^ve- 



1. fU students who^felt a high* 

degree of self-responsibility for 

* their successes and failures 
showed greater academic gains. 



1. 



Wechsler Intelligence Scale* 
for Children (WISC), Jastak 
Wid£ Range Achievement Arith- 
metic, Jastak' Wide Range 
Achievement Reading JWRAT) . 
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Average WISC-R IQ of '90^26^^ ear 
average scores, on Picture Ar- 
rangement; lowe.st scores on In- 
formation, Comprehension, VocaBu 
lary. 

Average achievement was deficient 
iri all $reas. Arithmetic scores 
were consistently lower th y an rea 
ding; violent offenders had the 
lowest reading scores. x " 

A relation between IQ and violent 
behavior was; not found. 



1. ' Mean scores consistently lower in 

arithmetic than reading in all ' 
five categories. * / 

2. School difficulties category 
lowest mean achievement level in 
arithmetic and reading. 

3. Highest grade equivalent compo- 
site mean in»neurotic-psycfiotic 
category. 

4. Emotionally disturbed children 
were retarded frorfi age level in 
school achievement. * 

5. Educational tiisabi 1 i ties conco- 
mitant with emo'tional disturbance 
and vice versa. 



V 



CO 

o 
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m ( table continues ) 



AUTHORS . 


r SUBJECTS . ' ' 


' ^ TASK 


* ' 'RESULTS* 


Scruggs & Mastro- 
pi*6ri (in press) 

x • „. 


50 BD and 28 ID students 
i n ar ades - 4 * * 

ill y i u vi ^p> <j ^ w # 

* * 


Training test-staking skiff's 

pplpvant tn thp ^t^nfnrH 

1 CICVQIlk kU L 1 1 L. J LQll 1 U( <J 

Achievement Test (SAT), 
reading subtests. 


•1. BD and \D students exhib . ted 

aer i c i enc 1 es on tne oM I r^ao i ng 
subtests. Test- scores improved* 
significantly wi th training. 


Scruggs & Mastro- 
piejri (1934) 

• , «. 
\ % ^ ' " 


148(3 LD and BD students 
in grades 1 - 3. 

• «. ' * » • - 

-* 


Stanford Achievement Test, 
all subtests. 

*> • 


1. jOnly slight differences between 
M»4 and BD groups, wkh LD -stu- 
dents consistently higher in 

A r h i p u am onf ' 

oL.ll 1 C V ClIlCll « 

2. Factor score patterns of LD and ' 
BD stfldants wene equivalent. 


* Scruj^ls, Mastro- v 
pi erf, & Tolf* 
(1985) 

4 . 
* 

^ 


41 LD and 44 BD" students 
in grades 4-6. 

» 

V 


Training test-taking ,skil Is 
relevant to the/SAT;, reading , 
an*d math subtests? 

•V 


1. Trained /LP and BD students gained 
oh the reading decoding subtest 
relative to controls. ■ 
2.. »Dif/j£r>ntial gain on the part of 
— ^JuC^vned| BD students over trainee) 
LD stucrents on "math .concept^" 
subtest. r . 


Scruggs, Mastro- } 
pieri , Tolf a, &" 
Jenkins (1985) 


37 BD students and SO 
nonhandicapped students, 
grades 5-6. \ 


Test Attitude?Scale (TAS). 


1. BO $nd nonhandicapped students 
did not "differ at the beginning 
of the school year. 

2^ After three days of testing, BD 
students reported lowey attitudes 
in personal feelings an,d personal 
importance of tests, but did not 
differ with resect ; to attitudes 
concerning fairness of tests, 


Stone & Rowley 
(1984) 

.• 

i 

» 


82 boys a*pd 34 girls; 
mean age 12 years; mean 
IQ 96.52 <rang§ 62-135) f 

> 

9 

* 


Wide Range Achievement Test • 
(WRAT), arithmetic and Rea- 
ding parts; Wechsler Intel 1 i - 
genc'e Scale for Cfiildren 
(WISC). 

* ' *" * 

■ ( . << 

m 

- ' ' - 


1. In reading and arithmetic, major- 
ity of children fell below level 
of achievement expectejT on basis 
of thronological age^ 

2. In using men tal^-atjes as basis for 
determlnipV achievement level, 
majority felY below expected 
level in bothVrading and arith- 
metic. . 

3. -vEmotional ly disturbed "chi lclren 

ldwer in arithmetic scores than 
reading scores (significantly). 

4. N In actual grade placement, larger. 

proportion were £n jgrades below 
that expected on basis of CA. 
~. i, , , 
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AUTHORS 



Tainkin (1960^ 



SUBJECTS . 

Children receiving resi- 
dential treatment for 
emotional disorders in 
psychiatrig hospital^ 22 
boys , mean age 8.7 fc 
years ;>^12 girls, mean 
age 9.^ years; combined 
mean age 9.0 years,. 



TASK 



Wide; Range Achievement Test 
(WRAt) arithmetic and reading 
parts. 



r 



RESULTS 



3. 



'Both arithmetic and reading grade 
rating within range commensurate 
with mean CA "of sample. ' 
Difference between grade ratings 
for reading and -arithmetic was^ 
significant at .,005* pomt basecl 
upon one-tailed test (t=2.91). 
32% (n=ll) demonstrated some de- 
gree jpff educational disability. 
41% (n=14) were educationally ad- 
vanced, and, remaining ,27% (n=9) . 
were at expected, grade level - 
observing difference between CA 
dnd grade rating. - 
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' ' j, • Academic Characteristics 

\ "• ' ■ - • 2 

• ' "Abstract «•/ 

i 

The academic performance. of 148(ybehavi orally disordered (BD) and 
learning disabled (CD) children attending grades 1-3 was compared. 
Results indicated that differences* in academic performance between 
BD and LD students was trivial. J Jn addition,, supplementary 
analyses indicated that the two groups did not differ with respect 

• - A i 

to 'factor structure erf- achievement test performance, nor did the# 
differ with Respect to reading/math correlations. Implications - 
with respect to cross-^ttegori cal education are discyssect. 
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' — n ■ 5 ; • . 

Academic Characteristic^ of Behavi oral 1y Disordered 
and Learning Disabled Students 

The issue of cros$.-categorical versus categorical placement 
in s^ecia^h education has be'en hotly debated ifi past years (e.g., 
Hallahan & Kauffman,' 1976; Hewett & Forness, 1974; Heward & 
OYl an sky, 1980).. This issue is based in. part upon a presumed 
similarity of academic functioning, among children representing v 
different catesories of exceptionality'. That is,Mf students of 
dif-ferent classifications are to be taught in the* same classroom, 
they should first be' shown to be functioning on .similar academic 
levels? However, if students classified as, behaviorally 
disordered (BD) can be shown to be functioning 6n an academ/ic 

• - A \ ■ 

level different from their learning disabled (L.D) counterparts, 
then cross-categorical placement may be less defensible. If, on 
tire otyier hand, LD and BD children function on^similar academic' 
level?, different arguments against cross-categorical placements 
must be voiced., * 

Recently, Epstein and Cul'linan (1983) argued- convi ncingly 
that the level of academic functioning of behaviorally disordered 
students, was , ,in fact, significantly higher than that of 
corresponding learning disabled stud^ts. These nesearchers 
matched J6 pairs of learning disabled and, behavi oral ly disordered 
students for chronological age, IQ, sex, and ethnicity, and 
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administered to all students several achievement measures. They 

concluded tfhat with chronologi cal ' age and IQ so matched, BD 

students ^ere significantly higher than ID students in'all 

-subtests 'wi th the excepts on* of % the General Information subtest On 

• * » "4 • 

the Peabody Individual Achievement Test and the Math subtest of 

the W/ide Range Achievement Te^T8D students* 1 however, "had scored 



y 



significantly higher on the Mathematics subtest of the PIAT). 

■ * 

*These significant differences amounted to over a on^-year 

' difference *in grade level scores, leading authors to suggest 

that "such differences c^uld present problem^ related to grouping 

and other instructional considerati ons " (Epstein & Cullinan, 1983^ 

* <- 
1 i 
■ p. 305). They concluded, "these data give .no support ta the 

1 * V. \ 

supposition that the traditional categories of mild-moderate * 

« 

educational handicaps are highly similar on the characteristics of 

academic achievement" (p. 305). 

The results of the Epstein and Cullinan investigation 
• \ ■ * 

provide valuable information regarding /el ati ve achievement 

discrepancies of 'BD and LD 'students. Some limit^ti ons .of that * 

study, however, have been noted by the authors. These include, 

among other things, the facts that relatively small samples of, 

students were employed and that no girls or minority pupils were 

included r in the sample. To these above stated limitations could 

be added another: the conclusions of Epstein and Cullinan refer 

to only a small sample of LD and BD students^ matched on IQ, and 

V 

f 

1 

299 • ' 



• r Academic Characteristics 

5 

'provide Tittle information concerning academic achievement levels 

•of large numbers of, such students actually enrolled in publtc 

school special education classes. ** ■ . 

The use of IQ data in investigating the academic 

characteristics of behaviorally disordered students has been 

employed frequently in the past (Forness, Bennett, &Tose, 1983; 
f 

1 Graubard, 1964, 1971; Motto & Wilkins, 1968). Kauffman (1981) has 
indicated that use of JQ data on behavioraTly di sordered -students 
is critical for effectively assessing the academic characteristics 
of this 'papulation. Although matching on IQ with behaviorally 
disordered and other populations does provide information 
regarding relative discrepancies between ability and academic 
performance of the behavi oral ly^ di sordered population., it ; does not 
describe the actual level of academic performance exhibited by 

behaviorally disordered students actually enrolled in special 

' ■ • . ■ / 

education classes and how this performance differs f/rom that of 

\ r - s 

their learning- disabled counterparts . * The Epstein/and Cullinan 
U983) study is most informative regarding the relative 
ability/academic performance discrepancy of their sample of the 
two populations, but provides little information regarding the', 
direct comparison of learning disabled atnd behaviorally disordered 
students on measures of academic functioning. The present 
investigation was intended to investigate this issue by examining 
the achievement test scores of a large sample of LD and BD' 
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children as they were enrolled in special education classrooms. 
Through this procedure, it was thought that evidence could be 
acquired regarding possible academic differences in performance 
between these two populations. 

Method s 

' t Data were collected from 1480. students in grades 1-3 
attending special education classrooms in 58 elementary" schools in 
a western metropolitan area. Of this population, %% were Anglo, 
and 5% represented minority groups including Black, Hispanic, and 
Native American; 68.3 percent . (1012} were mkles, and 31.3% (470) 
were females. Three hundred eighty-two students WQge- attending 
first grade, 529 students were attending second grade,- and 571 
students were attending third grade. Six hundred nineteen (42%) 
were classified as ID and 863 -(58%) were classified BD according 
to Public Law 94-142 and local criteria. These criteria included, 
for ID students, average or above intelligence and a 40% 
discrepancy between ability and achievement in two areas of 
academic functioning. Criteria for classification as behavioral ly 

i 

disordered included average or above intelligence and marked 
deficits in behavioral and/or emotional functioning documented by 
teacher and psychologist, and which had proven resistant to 
simpler remediation. No academic criteri a were^specif ied for BD 
studervts. One thousand, three hundred and f^rty-seven students 
(91%) were attending resource room placements, while 135 students 
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(9%) were attending self-contained classrooms. IQ -data for this 
. • ' \ 
population were not available and, in fact v were not: soli ci ted for 

the purposes of this. study. Data were collected on the subjects 

for subtests of the 1973 edition of the Stanford Achievement Test 

(SAT) (Madden, Gardner, Rudman/Karl sen, & Merwin, 1973). All X " 

test data were collected from the same administration, spring, 

> »■ 

1983. 

Results 

Main Analyses \ ■ 

Multivariate analysis of vari ance, (MANOVA ) tests were 
computed between groups at each grade levZ, with raw scores from^ 
the SAT subtests as dependent measures. The MANOVA procedure was 
used to take into account the high level of intercorrelations 

r i 4 

I 

between subtests, and to control for, an inflated experiment-wise 
alpha level thought likely to result from repeated- J: tests on ;non- 
independent comparisons (Bock & .Haggard, ,1968; \erlinger & 
Pedhazur, 1973; Levin, in press; Marascuilo & Levi n,* 1983; Winer, 



1971).. Raw scores, rather^ t!han grade equivalents or percentiles, 
were computed because the ratio nature of the numbers was more 
appropriate for; meeting the assumptions of analysis/of variance 
(Ferguson, 1968), and because raw scores provide a mbre precise 
measure of test behavior.^ * \ '! 

Analysis of the data revealed a significant multivariate "F" 
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'approximation of 5.34, £ < .001 for second graders, a significant 
multivariate "F" approximation v qf 2.20, £ < .033 for. third 
graders, and a nonsfgnif icant multivariate "F " approximation of 
.87, £ < *48 for first graders. Visual inspection of the " 
descriptive data presented in Table i indicates that the 
achievement scores consistently favor thejifc group over the BD 
group, although the eff.ect sizes are small enough in all cases to' 

» » 

constitute questionable practical educational importance (Total 

••Score effect sizes of .14, .18, and .03 for first, secondhand 
) - J f ■ ■ 

third graders, respectively). As seen it^ffable 2, thase * . 

differences rarely exceed three or four months ""in grade equivalent' 
scores. 



[nSert Tables 1 and 2 about ^ere 

X 



/ The finding of a nonsignificant mill ti vari ate effect in the 
first grade sample pV-ecluded further analysis with univariate 
tests (Marascuilo & Levin, 1983)* However, univariate t tests 
were 'computed on the second and third grade" levels, for which ' 
significant multivariate effects had been found. To control for 
the possibility of Type I errors-,'specif ic pairwise comparisons 
were made at a^evel of significance appropriate to a familywise' 
alp&aUevel of .05 for each grade level. 1 In the case of the 
seven subtest^ on the second and 'third grade lev^'l, the resulting 
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alpha was .007. By these rather rigid. criteria, significant 

differences "favoring the LD group were nonetheless found at the V 

i 

Second grade level for the Vocabulary,, Listening Comprehension, 
Social Science, and Science subtests. Differences betwen groups 
hi Total Math and Spelling approached significance, but not at the 
level required by this analysis. Differences in reading were 
negligible, t < 1 in absolute valuer At the third grada level, no 
comparisons approached significance at the 7 required level, and 

f c 

foyr'of the seven comparisons resulted in t's < 1 in absolute 

value. The fact that a significant multivariate effect but no 

V : V . . / l 

^univariate effects were found is not uncomrrfbn and is doubtless a 

result of thfe fact that the MANOVA takes'into account the high 

level of correlations between subtests/ whi le the univariate tests 

^ do not .(Winer, 1971) 4 . • 

;> ^Supplementary Analysis \ , x 

Since statistical differences between BD and LD students were 
seen to be few, resulting in small effect sizes, supplementary 
analyses were commuted to' determine whether the patterns of 
achievement test performances could be $een to be different for* 

* thfe two'grou^. To this end, separate factor analyses were 
computed for BD and LD students at each grade level in order to 

.determine whether the groups differed from each other with respect 

t ' * - 

to underlying factor structure. Each of the six separate factor 
analyses revealed only one factor, accounting for between 81 to 
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,88% of total variance, and indicating that over all subtests, only 

•one factor was being measured for each group (perhaps, a "general 
• ■ 
.'cognitive ability" factor), and that no difference rn factor 

structure b'etween BD and LD groups &as discernible. In a fr.lloVK 
up analysis, individjuaK^Dr^ computed between -Total 



Reading and Total Math subtests for BD an 5 LD students at each 
grade level. Resulting correTations/ranged from .78^to .88 (all 
p's <..01) for all grduDS. Comparisons "made via Fisher's Z 
transformations (Ferguson, 1981) at each grade level indicated 

|hat at no poiRt were correlations for BD students statistically 

*" 

different from correlations for LD students (all* p's > .20). 

/ . • 

" Discussion 

\ ' - ■ 

\ i ResultsN^the present investigation suggest that BD students 

do not show better academic performance than do their LD age 

• * *. ■ 

l^eers when academic achievement scores of students actually 
attending special educatiori placements are" exami ned. Tfiese 

I findings, are in sharp contrast^ith thuse of Epstein and Cull inan 

I \ t ' 

; (1983) wh\) suggested that academic performance of BD students is ^ — 

* typically higher fhan that of LD age peers. 'The reason for these 

• discrepant findings very likely has to do with the fact that the 

Epstein and Cullinan subjects were matched by IQ ( , while the 

v * subjects in the present investigation represented the total 

i of a sample of students enrolled in LD and BD classes without 

r 

{ respect to intellectual, functioning. While the findings of v 
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Epstein and Cul iVj nail, are of theoretical importance in r that they 
underlinecJifferences in performance discrepancies between the two \ 

• x • V 

•populations in the sample selected, they^do not provide direct 
evidence concerning how a Marge sample o^ these students actually 

functions ii> c1 as'ses' compared with their learning disabled 
>t \ 1 

counterparts! • The conclusions of the present research indi cate 

a . ;■ 

that at lsast at the'primary |rade levels in the population 1 

<\ • t * ' 

sampled, LD and BD ch'i ldren , are in fact-very similar with respect 
* S . ' f \ 

to academic perf ornranc'e. . Even though statistically significant' 

differences were found on some comparisons, it must be remembered 

that the large sample size resulted in sufficient statistical 

power to -discern relatively small' effect sizes (Cohen, 1968) ."^ In 

fact, for Total Reading, Total Math, or Total Battery scores, 

these differences do not exceed two months in grade equivalent 

scores. 2 ^ > 

Although the sample size used in this investigation was " * 

t t - ¥ 

relatively large, vt should be recalled that the subjects came j 
from only one'cJeographicaT area/ This fact ma'y present problems 
in general ization of findfngs. However, it mtist also be 
mai/itained thaf the standards for inclusion in'special education 
placement in this area are very similar to criteria used- around 
the country.* In fact, these criteria make the findings more . 
Surprising in "that specific abi li ty/perf onmance discrepancies in ' 
areei^ of academic function! rfg are necessary requirements for LD 

.0 

? 
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placement, while they are not for placement. Nevertheless,, the 
x strong simi lari ti^s ^between the two taoups i ndi cfate that, for one 
reason or another, many ID and BD students in the*primary" grades 
apparently do function on a highly Similar academic level. This* 
finding does not suppor.t-the assertion of Gullinan, LVoyd, and 

0 v • • ' \ ' •» • 

^Epstein (1981) that academic def i ei ts\ may be minimal- in the 

primary grades and increa&e with 'age. j It was found, however., that 

the variability, of BD student performance descripti vely^xceefred 

* ■ * 

that af LD students at all grade >evels.' Such higher, levels of 

variability on the part of BD students have been reported by 

Forness it al . (1983); AUhouglr the relatively higher descriptive 

" ■ • A • 

J • 

level of variability here may simply be an artifact of the fact 

that an academic cutoff level was operating for LD but not BD 

students, it does suggest that a special education teacher may 

expect to find a wider range of academic achievement among BD 

students. \ • 

In contrast to the lEpstein and Cullinan (1983) investigation, 

t 

no evidence is given by these data that academic programming 
shou^prdceed differentially for the two groups. However, the 
fact that two groups are functioning at a "similar academic Jtevel 
does not necessarily mean that, instructional procedures shoul^i be 
the same. It may be, for g example, that the BD group%ay be more . 
responsive tb token economies and di rect. instructi on in 
independent study strategies, #hile. the LD group may be more 
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resppnsiye to peer tutoring and (small -group teacher-led direct 

instruction procedures. At present, however, it. must be concluded 

that little is known about op.timal instructional strategies for 1 D 

' < ' • & \ 

' vs. Btf children, and it is the opin.ion of the present authors- that 
research is greatly needed in this area, ■ 

The reason these two supposedly discrepant groups function in 

such a similar -level of academic performance is uncertain, and 

catq not .be given on the basis of the data presented here. v It has 

often been stated i^practice by those who work with LD'and BD 

children that the causal link between behavior problems and 

learning disabilities is a string one whose directionality is 

often in question, It may be that the causal relation between * 

r ... 
learning and behavioral disabilities is of sufficient strength V 
* , » * \ v , . 

that academic shortcomings are a frequent consequence, regardless 

of the nature of special education classification. 

r 

In spite of the. apparent discrepancies between the present 

ft k 

investigation and the Epstein and Cullinan (1983) study, the 
authors would like to end ,on a note of concordance with those 
^searchers. In our view, Epstein and Cullfnan are. quite correct 




in their assertion that effecti veness« of service is a much hiqher ,.- ! 

\ • • * ' \- 

priority than thd categcVical versus cross-categorical nature of 
that service, an Assertion for which empirical support is 
available (Hel ler, ^oltzman, & Messick, 1982). Although the 
present data sugge^tythat cr.oss-categori cal placement may b& - 
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advisable, the present authors would rather see effective 

* - ' 

educational programming in , categorical -settings than ineffective' 
teaching in cross-categorical settings. \lt is thou^fffc.* however, 
that the search for optimal educational settings can 'parallel the 
s'earch for, optimal educational strategies within such settings^ 1 
.and it is to these ends that the present research was addressed. 
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lit can be argued that multiple t_ tests on non-independent 
-data sets do not inflate the Type I error probability as muchin 
actual practice as expected by statistical theory, and in fact, - 
some recent Monte Carlo studies have supported this argument 
(Bernhardson, 1975; Carmer & Swanson, 1973; White, 1984). The 
decision made here was to use the more conservative procedure, 
especially considering the fact that the large sample size allowed 
sufficient power to detect relatively small differences even when 
the pairwise alpha level was quite small. 

2a case may be made that although academic functioning 
appears similar given a static achievement test measure, the 
population may differ with respect to rate of 'learning. If this 
were fhie, however, one would expect the BD students to begin to 
surpass the LD. students academically by the second or third gr/ade. 
Such differences over grade levels, however, were not observed. 
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BD 


(N - 253) 


ID (N » 


129) 










Grade 




Grade 




Effect 


J 

First grrde 


Percenti le 


Equivalent 


Percenti le 


Equivalent 


1* 


size 














Total reading 




1.4 


23 


1.5 




-.09 


Total math 


18 


1.3 


24 


1.4 


- 


-.15 


Vocabulary 


23 


1.0 


30 






-.16 


Listening 
comprehensi on 


16 


K-6 


22 


K-9 




> 

=-.16 


Total ' 


12 


1.1 


18 


1.3 


- 


-.14 




BD 


(N » 323) 


LD (N, =■ 


206) 












Grade 




Grade 




Effect 


1 


Percenti 1 e 


Equivalent 


Percenti le 


Equi valent 


t* 


,size 


ocuujivj yr auc 














lutai i c □ vj i 1 1 y 




2 0 


28 


2.0 


.55 


-.05 


Total math 


26 


1.9 


34 


2.1 


2.16 


-.20 


Vocabulary 


16 


1.8 


26 


2.2 


2.95** 


-.27 


\\ stening 

compnehensi on 




1.3 


20 


1.9 


3.31** 


-.30 


Spelling 


12 


1.6 


16 


1.8 


1.99 


-.18 


Social science 


14 


2.0 


28 


2.2 


3.43** 


-.31 


Science 


14 


1.5 


24 


2.1 


3. 10** 


-.29 


Total 


16 


1.6 


2S . • 


i.a 


1.99 

V 


-.18 



(table continues) 
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BO 

* 


(N * 287) 


• L0 (N * 


284) . 










Grade 




Grade 




Effec 




Percentile 


Equivalent 


Percentile 


Equi va-lent 


t* 


size 


Third grade 






• 








Total reading 


23 


2.5 


24 


2.5 


.14 


-.01 


Total math 


16 


2.9 


18 


3.0 


1.51 


-.13 


Vocabulary 


24 


2.5 


31 


2.9 


2.00 


-.17 


Li stening 
comprehension 


20 


2.5 


24 


2.8' 


1.61 


-.14 


Spel ling 


13 


2.5 


12 


2.5 


-.57 


.05 


Social science 


22 1 


2.8 


22 


2.8 


.50 


.04 


Science , 


f 12 


2.4 


16 


2.6 


-.08 


.01 


Total v> 


24 


2.7 


24 


2.7 


.95 


-x.08 



*A11 ^ statistics were computed on raw scores. 

^Statistically signfficant at the pre-spedf ied probability level, p < .007. 
+ 8ecause of a non-significant multivariate effect, univariate statistics were 
not computed. 
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Learning Disabled Students 
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How are your test-taking skills? 

1 . The short story, "The Four Seasons," is about: 

a. vegetation in North America 

b. wind current and their effects 
^ c. the changing weather 

d. the growth process 

2. The greatest advantage of using sient in the manufacture of steel is 
that slent makes steel 

a. transparent 

b. stainless 

c. heavy 

d. bulky 

3. The Japanese game of paduki 

a. can only be played by the Imperial Family 

b. is sometimes played indoors 

c. can never be played for more than 30 minuted 

d. is always played at every celebration • 

4. When Bestor crystals are added to water „ 

a. heat is given off . « • . 

b. the temperature of the solution rises 

c. the solution turns blue 

d. the container becomes warmer 7 



The reasoning strategies are explained, followed by the correct answer: 

r- ' 1 

1. The convergence strategy (stem), recently described by Smith (1982), 
involves teaching test-takers to examine all choices presented after 
the stem of a multiple-choice question in order to analyze the relation- 
ships of the distractors to each other and, thereby, identify the choice ( 
most likely to be correct. (1. c). *. 

2. Absur.d options ca"n be eliminated afs incorrect choices, and thus, 

' increase the probability of choosing the correct answer. (Gibb, 1 964). 
(2.b). ' • ' . 

3. Specific determiners (e.g., always, never, all), are words which 
provide cues to the likely correctness of choices, especially on true/ 
false items. (Slakter, 1970). (3. b). , 

» 

4. Identifying similar (but slightly different) options again narrows 
down the possibility of choosing incorrect answers. (Millman, 

■ • 1969), (4, c). v ' . 

v 31^ 
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Should guessing and answer changing be 
encouraged? 

Usually students are advised not to guess oq standarized multiple 
choice tests. However, according to Hammerjon (1965) and Bauer (1973), 
testwiselstudents tend to guess more often than their naive counter- 
parts, and as a result, obtain higher scores. Thus, an appropriate 
guessing strategy should be employed. 

Ebel (1965) concludes from his study with true/false tests that 
"students seeking highest scores on a° test are well advised to answer 
all questions even when the usual correction is applied (their blind 
guesses to true/false tend to be^correct more than half of the„time)." 

The problem to solve now becomes "How does a test-taker decide 
which answer is the best guess?" Numerous testwiSeness suggestions 
are provided by Millman's (1969) and Smith's (1982) guidelines. 

Beck (1978) studied the effect of changing item responses on scores 
of elementary school children on a standardized achievement test. 
Results clearly indicated that response changes on multiple-choice 
items tend to improve test scores. 

ln spite of conventional wisdom regarding guessing and answer 
changing, research evidence indicates that: * 

'v * , 

— Students should answer all questions, even when guessing is 



penalized. 



/ 



— Students should be encouraged to change any answer they have 
had second thoughts about. 
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Do separate answer sheets inhibit the 
performance of learning disabled, sjtudents? 

Yes, according to a recent study performed at Utah State University, 
♦LD and nondisabled students were given three subtests of the Compre- 
hensive Tests of Basic Skills (CTBS) fir which correcf answers were 
identified in the test book. Students were instructed to record the 
correct answers on the separate answer sheet as quickly and efficiently 
as possible. Learning disabled students' performance was found to be 
slower, less accurate, and less neat than their nonhandicapped peers. 
Figure A shows differences between LD and regular classroom students 
wi^h respect to accuracy and fluency on completion of the separate 
answer sheet. This discrepancy could contribute to measurement error 
in the LD population. However, it would also seem that LD students 
improved appreciably in use of separate answer sheets with practice. 
Figure B shows increase in fluency and accuracy of LD students after 
only three practice sessions with teacher feedback. 
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Are learning disabled students deficient 
in test-taking skills? If stf, do learning 
disabled students benefit from training? 



Yes, learning disabled students are deficient in test-taking skills. 
' Scruggs (1984, 1985) found LD students differed from their nonhandi- 
capped peers with respect to use of appropriate strategies on 
standardized achievement tests. These strategy dbffcits included use of . 
^prior knowledge, use of deductive reasoning skills, attention to appro- 
priate distractors, and selection of strategies appropriate to cbrrectly 
answering different types of items. 

* Recently, LD students have been trained in using appropriate test- 
taking strategies. Results indicated that test scores of trained students 
improved as much as 8-1 0 percentile points oh reading achievement 
tests over untrained control students (Scruggs & Mastropieri, in press). 
In addition,* separate investigation revealed that students' attitude 
towasd tests qualitatively improved as a result of traiaimf. 





What should X.D students be taught about 
test taking? 

' Our recent research indicates that LD students benefit most froyn ex- 
tended, guided practice and general familiarity with test conventions 
and formats. To this_end, LD students should be given relevant practice 
with questions and formats similar to those which they will see on IS 
achievement tests* (Students, of course, should not be given the exact 
items they will be tested on.) * • 

In addition, the following strategies have been successfully taught 
to LD students and have been effective in improving test scores: 

1. Never skip an answer. 

/^Becertain to attend to all distractors and refer to the reading 
' passage, even if you are "very sure" your answer is correct. 

3. If you are having great difficulty reading a passage, read the ques- 
tions and try to answer them anyway. If you have difficulty with 
some words in the questions, or distractors, answer anyway and base 
your answers on ^he words you can read. 

4. If you have attended to all parts of a passage and test question and 
still do not know an answer, there is still a good chance of getting 
the correct answer if you guess. 

y > • 

* 5. Be certain you are attending to the appropriate stimulus, such as the 
underlined sound irt a "word study skilte" subtest. As in other sub- 
tests, wrong answer choices are given which may look correct at 
first glance. 

6. Make sure you answer every item, even if you must hurry and guess • 
a lot rjear the end. You will probably get some of the answers' correct. 

Examples and practice activities will help develop these test-taking 
skills. - 
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